Multimodal weighted graph representation for information extraction from visually rich documents

被引:5
作者
Gbada, Hamza [1 ,2 ]
Kalti, Karim [2 ,3 ]
Mahjoub, Mohamed Ali [2 ]
机构
[1] Univ Sousse, Higher Inst Informat & Commun Technol, Sousse, Tunisia
[2] Natl Engn Sch Sousse ENISo, Lab Adv Technol & Intelligent Syst LATIS, Sousse, Tunisia
[3] Univ Monastir, Fac Sci Monastir, Monastir, Tunisia
关键词
Information extraction; Visually rich documents; Graph convolutional net works;
D O I
10.1016/j.neucom.2023.127223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel system for information extraction from visually rich documents (VRD) using a weighted graph representation. The proposed method aims to improve the performance of the information extraction task by capturing the relationships between various VRD components. The VRD is modeled as a weighted graph, in which visual, textual, and spatial features of text regions are encoded in nodes and edges representing the relationships between neighboring text regions. The information extraction task from VRD is performed as a node classification task through the use of a graph convolutional networks, where the VRD graphs are fed into the network. The approach is evaluated across diverse documents, encompassing invoices and receipts, revealing achievement levels equal to or surpassing robust baselines.
引用
收藏
页数:9
相关论文
共 50 条
[1]   Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network [J].
Gbada, Hamza ;
Kalti, Karim ;
Mahjoub, Mohamed Ali .
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 :248-263
[2]   Information Extraction from Text Intensive and Visually Rich Banking Documents [J].
Oral, Berke ;
Emekligil, Erdem ;
Arslan, Secil ;
Eryigit, Gulsen .
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
[3]   Enhancing Multimodal Information Extraction from Visually Rich Documents with 2D Positional Embeddings [J].
Arshad, Aresha ;
Moetesum, Momina ;
Hasan, Adnan Ul ;
Shafait, Faisal .
2024 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS, DICTA, 2024, :561-568
[4]   Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents [J].
Sarkhel, Ritesh ;
Nandi, Arnab .
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :247-262
[5]   A Span Extraction Approach for Information Extraction on Visually-Rich Documents [J].
Nguyen, Tuan-Anh D. ;
Vu, Hieu M. ;
Nguyen Hong Son ;
Minh-Tien Nguyen .
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 :353-363
[6]   DocParser: End-to-end OCR-Free Information Extraction from Visually Rich Documents [J].
Dhouib, Mohamed ;
Bettaieb, Ghassen ;
Shabou, Aymen .
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2023, PT V, 2023, 14191 :155-172
[7]   Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods [J].
Gbada, Hamza ;
Kalti, Karim ;
Mahjoub, Mohamed Ali .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025, 28 (01) :121-142
[8]   DAMGCN: Entity Linking in Visually Rich Documents with Dependency-Aware Multimodal Graph Convolutional Network [J].
Chen, Yi-Ming ;
Hou, Xiang-Ting ;
Lou, Dong-Fang ;
Liao, Zhi-Lin ;
Liu, Cheng-Lin .
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2023, PT III, 2023, 14189 :33-47
[9]   VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach [J].
Kerroumi, Mohamed ;
Sayem, Othmane ;
Shabou, Aymen .
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 :389-402
[10]   Fusion of visual representations for multimodal information extraction from unstructured transactional documents [J].
Berke Oral ;
Gülşen Eryiğit .
International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 :187-205