TransTab: A transformer-based approach for table detection and tabular data extraction from scanned document images

被引：0

作者：

Wang, Yongzhou ^{[1
]}

Lv, Wenliang ^{[2
]}

Wu, Weijie ^{[1
]}

Xie, Guanheng ^{[2
]}

Lu, BiBo ^{[3
]}

Wang, ChunYang ^{[3
]}

Zhan, Chao ^{[3
]}

Su, Baishun ^{[3
]}

机构：

[1] Xinxiang Energy Co Ltd, Jiaozuo Coal Ind Grp, Xinxiang 453633, Peoples R China

[2] GL Technol Co Ltd, Zhengzhou 450000, Peoples R China

[3] Henan Polytech Univ, Sch Comp Sci & Technol, Jiaozuo 454000, Peoples R China

来源：

MACHINE LEARNING WITH APPLICATIONS | 2025年 / 20卷

关键词：

Deep learning; Table detection; Attention mechanisms; Convolutional neural networks(CNNs); Transformer; OCR;

D O I：

10.1016/j.mlwa.2025.100665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Table detection and content extraction are crucial tasks in document analysis. Traditional convolutional neural network (CNN) methods often face limitations when dealing with complex tables, such as cross-column, cross-row, and multi-dimensional tables. Although existing methods have shown good performance in recognizing simpler tables, the model's effectiveness often falls short of meeting practical application needs in the case of complex layouts. The structural intricacy of tables requires more advanced recognition and extraction strategies, particularly in the precise localization and extraction of rows and columns. To address the shortcomings of traditional methods in handling complex table structures, this paper proposes an end-to-end document table detection and content extraction method based on Transformer, named TransTab. TransTab effectively overcomes the limitations of traditional CNN approaches by incorporating Vision Transformer (ViT) into the table recognition task, enabling it to handle complex table structures across columns and rows. The self-attention mechanism of ViT allows the model to capture long-range dependencies within the table, resulting in high accuracy in detecting table boundaries, cell separations, and internal table structures. This paper also introduces separate modules for table detection and column detection, which are responsible for recognizing the overall table structure and accurately positioning columns, respectively. Through this modular design, the model can better adapt to tables with diverse complex layouts, thereby improving its ability to process intricate tables. Finally, EasyOCR technology is employed to extract text from the table. Experimental results demonstrate that TransTab outperforms the state-of-the-art methods across several metrics. This research provides a novel solution for the automatic recognition and processing of document tables, paving the way for future in document tasks.

引用

页数：13

共 48 条

[1] CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images [J].

Agarwal, Madhav ;

Mondal, Ajoy ;

Jawahar, C., V .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :9491-9498

[2]

Akhil S., 2016, A seminar report. Department of computer science and engineering national institute of technology, calicut monsoon

[3]

Al-Kababji Ayman, 2022, Intelligent Systems and Pattern Recognition: Second International Conference, ISPR 2022, Revised Selected Papers. Communications in Computer and Information Science (1589), P204, DOI 10.1007/978-3-031-08277-1_17

[4]

Alter G., 2020, IASSIST Quarterly, V44

[5]

Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838

[6]

Chandra Mayank Arya, 2021, International Journal of Information Technology, V13, P1, DOI [10.1007/s41870-017-0080-1, 10.1007/s41870-017-0080-1]

[7] Fast Image Processing with Fully-Convolutional Networks [J].

Chen, Qifeng ;

Xu, Jia ;

Koltun, Vladlen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2516-2525

[8]

Cotterell R, 2024, Arxiv, DOI arXiv:2404.09383

[9] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[10]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 →