Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

被引:29
作者
Carbonell, Manuel [1 ,2 ]
Riba, Pau [1 ]
Villegas, Mauricio [2 ]
Fornes, Alicia [1 ]
Llados, Josep [1 ]
机构
[1] Univ Autonoma Barcelona, Comp Vis Ctr, Comp Sci Dept, Barcelona, Spain
[2] Omni Us, Berlin, Germany
来源
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年
关键词
Relation Extraction Name Entity Recognition; Semi-structured Documents; Administrative Documents; Graph Neural Networks;
D O I
10.1109/ICPR48806.2021.9412669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of administrative documents to communicate and leave record of business information requires of methods able to automatically extract and understand the content from such documents in a robust and efficient way. In addition, the semi-structured nature of these reports is specially suited for the use of graph-based representations which are flexible enough to adapt to the deformations from the different document templates. Moreover, Graph Neural Networks provide the proper methodology to learn relations among the data elements in these documents. In this work we study the use of Graph Neural Network architectures to tackle the problem of entity recognition and relation extraction in semi-structured documents. Our approach achieves state of the art results in the three tasks involved in the process. Additionally, the experimentation with two datasets of different nature demonstrates the good generalization ability of our approach.
引用
收藏
页码:9622 / 9627
页数:6
相关论文
共 23 条
[1]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[2]  
Battaglia Peter W, 2018, INT C LEARN REPR
[3]  
Bojanowski Piotr, 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI DOI 10.1162/TACL_A_00051
[4]   A neural model for text localization, transcription and named entity recognition in full pages [J].
Carbonell, Manuel ;
Fornes, Alicia ;
Villegas, Mauricio ;
Llados, Josep .
PATTERN RECOGNITION LETTERS, 2020, 136 :219-227
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Elliot-Gibson V, 2019, SECONDARY FRACTURE PREVENTION: AN INTERNATIONAL PERSPECTIVE, P79, DOI 10.1016/B978-0-12-813136-7.00005-3
[7]   ICDAR2017 Competition on Information Extraction in Historical Handwritten Records [J].
Fornes, Alicia ;
Romero, Veronica ;
Baro, Arnau ;
Ignacio Toledo, Juan ;
Andreu Sanchez, Joan ;
Vidal, Enrique ;
Llados, Josep .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1389-1394
[8]  
Gilmer J, 2017, PR MACH LEARN RES, V70
[9]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[10]   FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [J].
Jaume, Guillaume ;
Ekenel, Hazim Kemal ;
Thiran, Jean-Philippe .
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 2ND INTERNATIONAL WORKSHOP ON OPEN SERVICES AND TOOLS FOR DOCUMENT ANALYSIS (OST), VOL 2, 2019, :1-6