Table Recognition in Scanned Documents

被引:1
作者
Kazdar, Takwa [1 ,2 ]
Jmal, Marwa [1 ]
Souidene, Wided [1 ]
Attia, Rabah [1 ]
机构
[1] Univ Carthage, SERCOM Lab, Ecole Polytech Tunisie, La Marsa, Tunisia
[2] Telnet Technoctr, Telnet Holding, Les Berges Du Lac, Tunisia
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022 | 2022年 / 13501卷
关键词
Table detection; Table structure recognition; Scanned document; Faster R-CNN; Heuristics;
D O I
10.1007/978-3-031-16014-1_58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Invoices are so vastly used in business. For each invoice, an employee has to verify carefully written data including date, legal, and the courtesy amount present in each table. However, this task is not only time-consuming but also prone to inaccuracies and errors, especially when it comes to processing a massive amount of invoices. A smart capture system is required to facilitate processing invoices automatically and it is more challenging since relevant data are not narrative but arranged in tables. Although it is true that OCR (Optical Character Recognition) is able to read and capture data, it suffers from inefficiency in table locating and loses structural features of tabular data. Table recognition is widely carried out using deep learning and heuristics and a better result was reached as humans would. In this paper, we present a part of a smart capture system for invoices which is based on table recognition workflow for scanned invoices. This workflow consists of three main steps: the first step is a prepossessing step which is used to enhance the quality of scanned invoices. The second step is a deep learning-based table detection approach where we use DocCutout and DocCutmix for data augmentation. The third step is a heuristic-based table structure recognition approach. The presented approaches are evaluated on public data sets.
引用
收藏
页码:744 / 754
页数:11
相关论文
共 50 条
  • [1] ClusTi: Clustering Method for Table Structure Recognition in Scanned Images
    Zucker, Arthur
    Belkada, Younes
    Hanh Vu
    Van Nam Nguyen
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (04) : 1765 - 1776
  • [2] ClusTi: Clustering Method for Table Structure Recognition in Scanned Images
    Arthur Zucker
    Younes Belkada
    Hanh Vu
    Van Nam Nguyen
    Mobile Networks and Applications, 2021, 26 : 1765 - 1776
  • [3] End-to-end table structure recognition and extraction in heterogeneous documents
    Kashinath, Tejas
    Jain, Twisha
    Agrawal, Yash
    Anand, Tanvi
    Singh, Sanjay
    APPLIED SOFT COMPUTING, 2022, 123
  • [4] A Robust Segmentation of Scanned Documents
    Park, Hyung Jun
    Yi, Ji Young
    COLOR IMAGING XX: DISPLAYING, PROCESSING, HARDCOPY, AND APPLICATIONS, 2015, 9395
  • [5] Reproducing tables in scanned documents
    Jahan, M. A. C. Akmal
    Ragel, Roshan G.
    JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (04): : 367 - 377
  • [6] Locating Tables in Scanned Documents for Reconstructing and Republishing
    Jahan, M. A. C. Akmal
    Ragel, Roshan G.
    2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,
  • [7] Automatic extraction of table metadata from digital documents
    Liu, Ying
    Mitra, Prasenjit
    Giles, C. Lee
    Bai, Kun
    OPENING INFORMATION HORIZONS, 2006, : 339 - +
  • [8] Table understanding in structured documents
    Holecek, Martin
    Hoskovec, Antonin
    Baudis, Petr
    Klinger, Pavel
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 158 - 164
  • [9] HybridTabNet: Towards Better Table Detection in Scanned Document Images
    Nazir, Danish
    Hashmi, Khurram Azeem
    Pagani, Alain
    Liwicki, Marcus
    Stricker, Didier
    Afzal, Muhammad Zeshan
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [10] Adaptive Scaling for Archival Table Structure Recognition
    Li, Xiao-Hui
    Yin, Fei
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 80 - 95