A large-scale dataset for end-to-end table recognition in the wild

被引:7
作者
Yang, Fan [1 ]
Hu, Lei [1 ]
Liu, Xinwu [2 ]
Huang, Shuangping [1 ,3 ]
Gu, Zhenghui [4 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China
[2] Zhuzhou CRRC Times Elect Co Ltd, Zhuzhou 412001, Peoples R China
[3] Pazhou Lab, Guangzhou 510335, Peoples R China
[4] South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Peoples R China
关键词
D O I
10.1038/s41597-023-01985-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Table recognition (TR) is one of the research hotspots in pattern recognition, which aims to extract information from tables in an image. Common table recognition tasks include table detection (TD), table structure recognition (TSR) and table content recognition (TCR). TD is to locate tables in the image, TCR recognizes text content, and TSR recognizes spatial & ontology (logical) structure. Currently, the end-to-end TR in real scenarios, accomplishing the three sub-tasks simultaneously, is yet an unexplored research area. One major factor that inhibits researchers is the lack of a benchmark dataset. To this end, we propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild, providing complete annotation dedicated to end-to-end TR research. It is the largest and first bi-lingual dataset for end-to-end TR, with 38.1 K tables in which 20.4 K are in English and 17.7 K are in Chinese. The samples have diverse forms, such as the border-complete and -incomplete table, regular and irregular table (rotated, distorted, etc.). The scenarios are multiple in the wild, varying from scanned to camera-taken images, documents to Excel tables, educational test papers to financial invoices. The annotations are complete, consisting of the table body spatial annotation, cell spatial & logical annotation and text content for TD, TSR and TCR, respectively. The spatial annotation utilizes the polygon instead of the bounding box or quadrilateral adopted by most datasets. The polygon spatial annotation is more suitable for irregular tables that are common in wild scenarios. Additionally, we propose a visualized and interactive annotation tool named TableMe to improve the efficiency and quality of table annotation.
引用
收藏
页数:14
相关论文
共 50 条
[31]   Semantic Communication-Aware End-to-End Routing in Large-Scale LEO Satellite Networks [J].
Guo, Binquan ;
Xiong, Zehui ;
Wang, Bo ;
Quek, Tony Q. S. ;
Han, Zhu .
2024 IEEE INTERNATIONAL CONFERENCE ON METAVERSE COMPUTING, NETWORKING, AND APPLICATIONS, METACOM 2024, 2024, :137-142
[32]   Accurate End-to-End Delay Bound Analysis for Large-Scale Network Via Experimental Comparison [J].
Hong, Xiao ;
Gao, Yuehong ;
Yang, Hongwen .
IEICE TRANSACTIONS ON COMMUNICATIONS, 2022, E105B (04) :472-484
[33]   KNOWLEDGE TRANSFER FROM LARGE-SCALE PRETRAINED LANGUAGE MODELS TO END-TO-END SPEECH RECOGNIZERS [J].
Kubo, Yotaro ;
Karita, Shigeki ;
Bacchiani, Michiel .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :8512-8516
[34]   An End-to-End Localizer for Long-Term Topological Localization in Large-Scale Changing Environments [J].
Cao, Fengkui ;
Wu, Hao ;
Wu, Chengdong .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (05) :5140-5149
[35]   Affordable End-to-End Solution for Change Detection and Progress Monitoring in Large-Scale Construction Sites [J].
Będkowski, Janusz .
SSRN, 1600,
[36]   From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping [J].
Zheng, Kaiyu ;
Pronobis, Andrzej .
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, :3511-3518
[37]   TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text [J].
Singh, Amanpreet ;
Peng, Guan ;
Toh, Mandy ;
Huang, Jing ;
Galuba, Wojciech ;
Hassner, Tal .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8798-8808
[38]   Resilient End-to-End Message Protection for Large-scale Cyber-Physical System Communications [J].
Kim, Young-Jin ;
Kolesnikov, Vladimir ;
Thottan, Marina .
2012 IEEE THIRD INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2012, :193-198
[39]   End-to-End Large Vocabulary Speech Recognition for the Serbian Language [J].
Popovic, Branislav ;
Pakoci, Edvin ;
Pekar, Darko .
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 :343-352
[40]   End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild [J].
Dresvyanskiy, Denis ;
Ryumina, Elena ;
Kaya, Heysem ;
Markitantov, Maxim ;
Karpov, Alexey ;
Minker, Wolfgang .
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2022, 6 (02)