A large-scale dataset for end-to-end table recognition in the wild

被引:7
作者
Yang, Fan [1 ]
Hu, Lei [1 ]
Liu, Xinwu [2 ]
Huang, Shuangping [1 ,3 ]
Gu, Zhenghui [4 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China
[2] Zhuzhou CRRC Times Elect Co Ltd, Zhuzhou 412001, Peoples R China
[3] Pazhou Lab, Guangzhou 510335, Peoples R China
[4] South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Peoples R China
关键词
D O I
10.1038/s41597-023-01985-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Table recognition (TR) is one of the research hotspots in pattern recognition, which aims to extract information from tables in an image. Common table recognition tasks include table detection (TD), table structure recognition (TSR) and table content recognition (TCR). TD is to locate tables in the image, TCR recognizes text content, and TSR recognizes spatial & ontology (logical) structure. Currently, the end-to-end TR in real scenarios, accomplishing the three sub-tasks simultaneously, is yet an unexplored research area. One major factor that inhibits researchers is the lack of a benchmark dataset. To this end, we propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild, providing complete annotation dedicated to end-to-end TR research. It is the largest and first bi-lingual dataset for end-to-end TR, with 38.1 K tables in which 20.4 K are in English and 17.7 K are in Chinese. The samples have diverse forms, such as the border-complete and -incomplete table, regular and irregular table (rotated, distorted, etc.). The scenarios are multiple in the wild, varying from scanned to camera-taken images, documents to Excel tables, educational test papers to financial invoices. The annotations are complete, consisting of the table body spatial annotation, cell spatial & logical annotation and text content for TD, TSR and TCR, respectively. The spatial annotation utilizes the polygon instead of the bounding box or quadrilateral adopted by most datasets. The polygon spatial annotation is more suitable for irregular tables that are common in wild scenarios. Additionally, we propose a visualized and interactive annotation tool named TableMe to improve the efficiency and quality of table annotation.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] End-to-end Learning of Driving Models from Large-scale Video Datasets
    Xu, Huazhe
    Gao, Yang
    Yu, Fisher
    Darrell, Trevor
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3530 - 3538
  • [22] End-to-end table structure recognition and extraction in heterogeneous documents
    Kashinath, Tejas
    Jain, Twisha
    Agrawal, Yash
    Anand, Tanvi
    Singh, Sanjay
    APPLIED SOFT COMPUTING, 2022, 123
  • [23] Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification
    Liu, Si
    Jung, Eun-Sung
    Kettimuthu, Rajkumar
    Sun, Xian-He
    Papka, Michael
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3002 - 3007
  • [24] SIAMESE CAPSULE NETWORK FOR END-TO-END SPEAKER RECOGNITION IN THE WILD
    Hajavi, Amirhossein
    Etemad, Ali
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7203 - 7207
  • [25] Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    INTERSPEECH 2020, 2020, : 2822 - 2826
  • [26] LARGE-SCALE UNSUPERVISED PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Wang, Pengwei
    Wei, Liangchen
    Cao, Yong
    Xie, Jinghui
    Nie, Zaiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7999 - 8003
  • [27] Data aggregation with end-to-end confidentiality and integrity for large-scale wireless sensor networks
    Jie Cui
    Lili Shao
    Hong Zhong
    Yan Xu
    Lu Liu
    Peer-to-Peer Networking and Applications, 2018, 11 : 1022 - 1037
  • [28] Data aggregation with end-to-end confidentiality and integrity for large-scale wireless sensor networks
    Cui, Jie
    Shao, Lili
    Zhong, Hong
    Xu, Yan
    Liu, Lu
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2018, 11 (05) : 1022 - 1037
  • [29] A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
    Li, Yan
    Wang, Yapeng
    Hoi, Lap Man
    Yang, Dingcheng
    Im, Sio-Kei
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2025, 2025 (01):
  • [30] A Systematic Investigation on end-to-end Deep Recognition of Grocery Products in the Wild
    Leo, Marco
    Carcagni, Pierluigi
    Distante, Cosimo
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7234 - 7241