Nguyen, M.-T. and Phan, V.-A. and Linh, L.T. and Son, N.H. and Dung, L.T. and Hirano, M. and Hotta, H. (2020) Transfer Learning for Information Extraction with Limited Data. In: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, 11 October 2019 through 13 October 2019.
Full text not available from this repository. (Upload)Abstract
This paper presents a practical approach to fine-grained information extraction. Through plenty of authors’ experiences in practically applying information extraction to business process automation, there can be found a couple of fundamental technical challenges: (i) the availability of labeled data is usually limited and (ii) highly detailed classification is required. The main idea of our proposal is to leverage the concept of transfer learning, which is to reuse the pre-trained model of deep neural networks, with a combination of common statistical classifiers to determine the class of each extracted term. To do that, we first exploit BERT to deal with the limitation of training data in real scenarios, then stack BERT with Convolutional Neural Networks to learn hidden representation for classification. To validate our approach, we applied our model to an actual case of document processing using a public data of competitive bids for development projects in Japan. We used 100 documents for training and testing and confirmed that the model enables to extract fine-grained named entities with a detailed level of information preciseness specialized in the targeted business process, such as a department name of application receivers. © 2020, Springer Nature Singapore Pte Ltd.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Divisions: | Faculties > Faculty of Information Technology |
Identification Number: | 10.1007/978-981-15-6168-9_38 |
Uncontrolled Keywords: | Administrative data processing; Computational linguistics; Convolutional neural networks; Data mining; Deep learning; Deep neural networks; Information retrieval; Transfer learning; Business Process; Business process automation; Detailed classification; Development project; Document-processing; Statistical classifier; Technical challenges; Training and testing; Classification (of information) |
Additional Information: | Conference code: 241929. Language of original document: English. All Open Access, Green. |
URI: | http://eprints.lqdtu.edu.vn/id/eprint/9138 |