TableRobot: An Automatic Annotation Method for Heterogeneous Tables

被引:0
作者
Wu, Guibin [1 ]
Zhou, Junjie [1 ]
Yang, Jingshu [1 ]
Lv, Xiaobing [1 ]
Xiong, Yongping [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, 10th Xitucheng Rd, Beijing 100876, Peoples R China
来源
2020 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI2020) | 2021年 / 187卷
关键词
Table recognition; Dataset; Deep learning; Annotation; TableRobot;
D O I
10.1016/j.procs.2021.04.081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using deep learning networks to recognize the table attracts lots of attention. However, due to the lack of high-quality table datasets, the performance of using deep learning networks is limited. Therefore, TableRobot has been proposed, an automatic annotation method for heterogeneous tables. To be more specific, the annotations of table consist of the coordinates of the item block and the mapping relationship between item blocks and table cells. In order to transform the task, we successfully design an algorithm based on the greedy approach to find the optimum solution. To evaluate the performance of TableRobot, we check the annotation data of 3000 tables collected from the LaTex documents in arXiv.com, and the result shows that TableRobot can generate table annotation datasets with the accuracy of 93.2%. Besides, the table annotation data is feed into GraphTSR which is a state-of-the-art table recognition graph neural network, and the F1 value of the network has increased by nearly 10% compared to before. (C) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of the International Conference on Identification, Information and Knowledge in the internet of Things, 2020.
引用
收藏
页码:432 / 439
页数:8
相关论文
共 12 条
  • [1] Chi Zewen, 2019, COMPLICATED TABLE ST
  • [2] ICDAR 2013 Table Competition
    Goebel, Max
    Hassan, Tamir
    Oro, Ermelinda
    Orsi, Giorgio
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1449 - 1453
  • [3] Kieninger T, 1999, LECT NOTES COMPUT SC, V1655, P255
  • [4] Klamp S., 2014, D-Lib Magazine, V20, P7
  • [5] Li M, 2019, TABLEBANK TABLE BENC TABLEBANK TABLE BENC
  • [6] Li Y, 2019, 2019 INT C DOC AN RE 2019 INT C DOC AN RE
  • [7] Qasim SR, 2019, INT C DOC AN REC INT C DOC AN REC
  • [8] LabelMe: A database and web-based tool for image annotation
    Russell, Bryan C.
    Torralba, Antonio
    Murphy, Kevin P.
    Freeman, William T.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 77 (1-3) : 157 - 173
  • [9] Schreiber S, 2017, IAPR INT C DOC AN RE IAPR INT C DOC AN RE
  • [10] Tupaj S., 1996, Extracting Tabular Information From Text Files