Toward Semi-Supervised Graphical Object Detection in Document Images

被引：7

作者：

Kallempudi, Goutham ^{[1
]}

Hashmi, Khurram Azeem ^{[1
,2
,3
]}

Pagani, Alain ^{[3
]}

Liwicki, Marcus ^{[4
]}

Stricker, Didier ^{[1
,3
]}

Afzal, Muhammad Zeshan ^{[1
,2
,3
]}

机构：

[1] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany

[2] Tech Univ Kaiserslautern, Mindgarage, D-67663 Kaiserslautern, Germany

[3] German Res Inst Artificial Intelligence DFKI, D-67663 Kaiserslautern, Germany

[4] Lulea Univ Technol, Dept Comp Sci, S-97187 Lulea, Sweden

来源：

FUTURE INTERNET | 2022年 / 14卷 / 06期

基金：

欧盟地平线“2020”;

关键词：

graphical page objects; object detection; document image analysis; semi-supervised; soft teacher; TABLE RECOGNITION; PERFORMANCE;

D O I：

10.3390/fi14060176

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4, +1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.

引用

页数：21

共 57 条

[1]

[Anonymous], 2017, INT C MED IM COMP CO

[2]

Berthelot D, 2019, ADV NEUR IN, V32

[3] A Survey of Graphical Page Object Detection with Deep Neural Networks [J].

Bhatt, Jwalin ;

Hashmi, Khurram Azeem ;

Afzal, Muhammad Zeshan ;

Stricker, Didier .

APPLIED SCIENCES-BASEL, 2021, 11 (12)

[4]

Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838

[5] Table Detection in Noisy Off-line Handwritten Documents [J].

Chen, Jin ;

Lopresti, Daniel .

11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :399-403

[6]

e Silva Ana Costa, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P843, DOI 10.1109/ICDAR.2009.185

[7] A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures [J].

Fang, Jing ;

Gao, Liangcai ;

Bai, Kun ;

Qiu, Ruiheng ;

Tao, Xin ;

Tang, Zhi .

11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :779-783

[8] Deep semi-supervised learning with contrastive learning and partial label propagation for image data [J].

Gan, Yanglan ;

Zhu, Huichun ;

Guo, Wenjing ;

Xu, Guangwei ;

Zou, Guobing .

KNOWLEDGE-BASED SYSTEMS, 2022, 245

[9] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[10]

Grandvalet Y., 2005, Advances in Neural Information Processing Systems, P529

← 1 2 3 4 5 6 →