Information Extraction of Domain-specific Business Documents with Limited Data

被引:2
作者
Minh-Tien Nguyen [1 ,2 ]
Le Thai Linh [1 ]
Dung Tien Le [1 ]
Nguyen Hong Son [1 ]
Do Hoang Thai Duong [1 ]
Bui Cong Minh [1 ]
Akira Shojiguchi [1 ]
机构
[1] CINNAMON LAB, 10th Floor,Geleximco Bldg,36 Hoang Cau, Hanoi, Vietnam
[2] Hung Yen Univ Technol & Educ, Hung Yen, Vietnam
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
Information extraction; Document analysis;
D O I
10.1109/IJCNN52387.2021.9534328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual application to business cases, there is a big deadlock to adapt common extraction systems to domain-specific documents due to the limitation of preparation of training data. To overcome this issue, we introduce a model, which employs pre-trained language models with a customized CNN layer for domain adaptation. The model is validated on three Japanese domain-specific and two benchmark machine reading comprehension data sets (SQuADs). Experimental results confirm that our model achieves promising results which are applicable for actual business scenarios.
引用
收藏
页数:8
相关论文
共 50 条
[11]   Extraction and Semantic Representation of Domain-Specific Relations in Spanish Labour Law [J].
Revenko, Artem ;
Martin-Chozas, Patricia .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (69) :105-116
[12]   Domain-Specific Relation Extraction Using Distant Supervision Machine Learning [J].
Aljamel, Abduladem ;
Osman, Taha ;
Acampora, Giovanni .
2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, :92-103
[13]   Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph [J].
Zhao, Huaxuan ;
Pan, Yueling ;
Yang, Feng .
IEEE ACCESS, 2020, 8 :168087-168098
[14]   Evaluation of a Complex Information Extraction Application in Specific Domain [J].
Besancon, Romaric ;
Ferret, Olivier ;
Jean-Louis, Ludovic .
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, :2056-2063
[15]   MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents [J].
Martinez, Paloma ;
Gonzalez-Cristobal, Jose C. ;
Moreno Sandoval, Antonio .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47) :347-348
[16]   Information Extraction from Multi-Domain Scientific Documents: Methods and Insights [J].
Batura, Tatiana ;
Yerimbetova, Aigerim ;
Mukazhanov, Nurzhan ;
Shvarts, Nikita ;
Sakenov, Bakzhan ;
Turdalyuly, Mussa .
APPLIED SCIENCES-BASEL, 2025, 15 (16)
[17]   Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual Semantics [J].
Abulaish, Muhammad ;
Fazil, Mohd ;
Zaki, Mohammed J. .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (04)
[18]   Cooperative and Fast-Learning Information Extraction from Business Documents for Document Archiving [J].
Esser, Daniel .
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 WORKSHOPS, 2013, 8186 :22-31
[19]   Gain more with less: Extracting information from business documents with small data [J].
Nguyen, Minh-Tien ;
Son, Nguyen Hong ;
Linh, Le Thai .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
[20]   Taking Natural Language Generation and Information Extraction to Domain Specific Tasks [J].
Varma, Sandeep ;
Shivam, Shivam ;
Natarajan, Sarun ;
Biswas, Snigdha ;
Gupta, Jahnvi .
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 :713-728