Multimodal weighted graph representation for information extraction from visually rich documents

被引:2
作者
Gbada, Hamza [1 ,2 ]
Kalti, Karim [2 ,3 ]
Mahjoub, Mohamed Ali [2 ]
机构
[1] Univ Sousse, Higher Inst Informat & Commun Technol, Sousse, Tunisia
[2] Natl Engn Sch Sousse ENISo, Lab Adv Technol & Intelligent Syst LATIS, Sousse, Tunisia
[3] Univ Monastir, Fac Sci Monastir, Monastir, Tunisia
关键词
Information extraction; Visually rich documents; Graph convolutional net works;
D O I
10.1016/j.neucom.2023.127223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel system for information extraction from visually rich documents (VRD) using a weighted graph representation. The proposed method aims to improve the performance of the information extraction task by capturing the relationships between various VRD components. The VRD is modeled as a weighted graph, in which visual, textual, and spatial features of text regions are encoded in nodes and edges representing the relationships between neighboring text regions. The information extraction task from VRD is performed as a node classification task through the use of a graph convolutional networks, where the VRD graphs are fed into the network. The approach is evaluated across diverse documents, encompassing invoices and receipts, revealing achievement levels equal to or surpassing robust baselines.
引用
收藏
页数:9
相关论文
共 49 条
  • [1] Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network
    Gbada, Hamza
    Kalti, Karim
    Mahjoub, Mohamed Ali
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 248 - 263
  • [2] Information Extraction from Text Intensive and Visually Rich Banking Documents
    Oral, Berke
    Emekligil, Erdem
    Arslan, Secil
    Eryigit, Gulsen
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [3] Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents
    Sarkhel, Ritesh
    Nandi, Arnab
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 247 - 262
  • [4] A Span Extraction Approach for Information Extraction on Visually-Rich Documents
    Nguyen, Tuan-Anh D.
    Vu, Hieu M.
    Nguyen Hong Son
    Minh-Tien Nguyen
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 353 - 363
  • [5] Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods
    Gbada, Hamza
    Kalti, Karim
    Mahjoub, Mohamed Ali
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024, : 121 - 142
  • [6] VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach
    Kerroumi, Mohamed
    Sayem, Othmane
    Shabou, Aymen
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 389 - 402
  • [7] Fusion of visual representations for multimodal information extraction from unstructured transactional documents
    Berke Oral
    Gülşen Eryiğit
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 187 - 205
  • [8] Fusion of visual representations for multimodal information extraction from unstructured transactional documents
    Oral, Berke
    Eryigit, Gulsen
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (3) : 187 - 205
  • [9] Information Extraction and Graph Representation for the Design of Formulated Products
    Sunkle, Sagar
    Saxena, Krati
    Patil, Ashwini
    Kulkarni, Vinay
    Jain, Deepak
    Chacko, Rinu
    Rai, Beena
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2020, 2020, 12127 : 433 - 448
  • [10] VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
    Thanh-Dat Nguyen
    Tung Do-Viet
    Hung Nguyen-Duy
    Tuan-Hai Luu
    Le, Hung
    Le, Bach
    Thongtanunam, Patanamon
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 704 - 716