GraphRevisedIE: Multimodal information extraction with graph-revised network

被引:7
作者
Cao, Panfeng [1 ]
Wu, Jian [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
关键词
Document information extraction; Graph convolutional network; Transformer; IMAGES;
D O I
10.1016/j.patcog.2023.109542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Key information extraction (KIE) from visually rich documents (VRD) has been a challenging task in document intelligence because of not only the complicated and diverse layouts of VRD that make the model hard to generalize but also the lack of methods to exploit the multimodal features in VRD. In this paper, we propose a light-weight model named GraphRevisedIE that effectively embeds multimodal features such as textual, visual, and layout features from VRD and leverages graph revision and graph convolution to enrich the multimodal embedding with global context. Extensive experiments on multiple real-world datasets show that GraphRevisedIE generalizes to documents of varied layouts and achieves comparable or better performance compared to previous KIE methods. We also publish a business license dataset that contains both real-life and synthesized documents to facilitate research of document KIE. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 28 条
[1]  
Chiu J.P.C., 2016, T ASS COMPUTATIONAL, V4, P357, DOI [DOI 10.1162/TACLA00104, 10.1162/tacl_a_00104, DOI 10.1162/TACL_A_00104]
[2]  
Dengel AR, 2002, LECT NOTES COMPUT SC, V2423, P433
[3]  
Gui T, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1040
[4]  
Guo H., 2019, 2019 INT C DOC AN RE, P254
[5]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[6]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]  
Hong T, 2020, BROS: a pre-trained language model for understanding texts in document
[8]   Information extraction from historical handwritten document images with a context-aware neural model [J].
Ignacio Toledo, J. ;
Carbonell, Manuel ;
Fornes, Alicia ;
Llados, Josep .
PATTERN RECOGNITION, 2019, 86 (27-36) :27-36
[9]   FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [J].
Jaume, Guillaume ;
Ekenel, Hazim Kemal ;
Thiran, Jean-Philippe .
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 2ND INTERNATIONAL WORKSHOP ON OPEN SERVICES AND TOOLS FOR DOCUMENT ANALYSIS (OST), VOL 2, 2019, :1-6
[10]   Semi-supervised Learning with Graph Learning-Convolutional Networks [J].
Jiang, Bo ;
Zhang, Ziyan ;
Lin, Doudou ;
Tang, Jin ;
Luo, Bin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11305-11312