TransTab: A transformer-based approach for table detection and tabular data extraction from scanned document images

被引：0

作者：

Wang, Yongzhou ^{[1
]}

Lv, Wenliang ^{[2
]}

Wu, Weijie ^{[1
]}

Xie, Guanheng ^{[2
]}

Lu, BiBo ^{[3
]}

Wang, ChunYang ^{[3
]}

Zhan, Chao ^{[3
]}

Su, Baishun ^{[3
]}

机构：

[1] Xinxiang Energy Co Ltd, Jiaozuo Coal Ind Grp, Xinxiang 453633, Peoples R China

[2] GL Technol Co Ltd, Zhengzhou 450000, Peoples R China

[3] Henan Polytech Univ, Sch Comp Sci & Technol, Jiaozuo 454000, Peoples R China

来源：

MACHINE LEARNING WITH APPLICATIONS | 2025年 / 20卷

关键词：

Deep learning; Table detection; Attention mechanisms; Convolutional neural networks(CNNs); Transformer; OCR;

D O I：

10.1016/j.mlwa.2025.100665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Table detection and content extraction are crucial tasks in document analysis. Traditional convolutional neural network (CNN) methods often face limitations when dealing with complex tables, such as cross-column, cross-row, and multi-dimensional tables. Although existing methods have shown good performance in recognizing simpler tables, the model's effectiveness often falls short of meeting practical application needs in the case of complex layouts. The structural intricacy of tables requires more advanced recognition and extraction strategies, particularly in the precise localization and extraction of rows and columns. To address the shortcomings of traditional methods in handling complex table structures, this paper proposes an end-to-end document table detection and content extraction method based on Transformer, named TransTab. TransTab effectively overcomes the limitations of traditional CNN approaches by incorporating Vision Transformer (ViT) into the table recognition task, enabling it to handle complex table structures across columns and rows. The self-attention mechanism of ViT allows the model to capture long-range dependencies within the table, resulting in high accuracy in detecting table boundaries, cell separations, and internal table structures. This paper also introduces separate modules for table detection and column detection, which are responsible for recognizing the overall table structure and accurately positioning columns, respectively. Through this modular design, the model can better adapt to tables with diverse complex layouts, thereby improving its ability to process intricate tables. Finally, EasyOCR technology is employed to extract text from the table. Experimental results demonstrate that TransTab outperforms the state-of-the-art methods across several metrics. This research provides a novel solution for the automatic recognition and processing of document tables, paving the way for future in document tasks.

引用

页数：13

共 48 条

[31]

Paliwal Shubham Singh, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P128, DOI 10.1109/ICDAR.2019.00029

[32] A Review on Random Forest: An Ensemble Classifier [J].

Parmar, Aakash ;

Katariya, Rakesh ;

Patel, Vatsal .

INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 :758-763

[33]

Paszke A, 2019, ADV NEUR IN, V32

[34] Spreadsheet quality assurance: a literature review [J].

Poon, Pak-Lok ;

Lau, Man Fai ;

Yu, Yuen Tak ;

Tang, Sau-Fun .

FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (02)

[35] CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents [J].

Prasad, Devashish ;

Gadpal, Ayan ;

Kapadni, Kshitij ;

Visave, Manish ;

Sultanpure, Kavita .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :2439-2447

[36]

Pyreddy P, 1997, ACM DIGITAL LIBRARIES '97, P193

[37]

Salehudin M., 2023, Journal of physics: conference series, V2641

[38]

Samantaray Milan, 2021, 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), P849, DOI 10.1109/ICECA52323.2021.9676015

[39] DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images [J].

Schreiber, Sebastian ;

Agne, Stefan ;

Wolf, Ivo ;

Dengel, Andreas ;

Ahmed, Sheraz .

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1162-1167

[40] DeCNT: Deep Deformable CNN for Table Detection [J].

Siddiqui, Shoaib Ahmed ;

Malik, Muhammad Imran ;

Agne, Stefan ;

Dengel, Andreas ;

Ahmed, Sheraz .

IEEE ACCESS, 2018, 6 :74151-74161

← 1 2 3 4 5 →