CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

被引：19

作者：

Hashmi, Khurram Azeem ^{[1
,2
,3
]}

Pagani, Alain ^{[3
]}

Liwicki, Marcus ^{[4
]}

Stricker, Didier ^{[1
,3
]}

Afzal, Muhammad Zeshan ^{[1
,2
,3
]}

机构：

[1] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany

[2] Tech Univ Kaiserslautern, Mindgarage, D-67663 Kaiserslautern, Germany

[3] German Res Inst Artificial Intelligence DFKI, D-67663 Kaiserslautern, Germany

[4] Lulea Univ Technol, Dept Comp Sci, S-97187 Lulea, Sweden

来源：

JOURNAL OF IMAGING | 2021年 / 7卷 / 10期

关键词：

table detection; table recognition; cascade Mask R-CNN; atrous convolution; recursive feature pyramid networks; document image analysis; deep neural networks; computer vision; object detection; RECOGNITION;

D O I：

10.3390/jimaging7100214

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

引用

页数：23

共 74 条

[1] Agarwal M., 2020, ARXIV200810831
[2] Arif S, 2018, 2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), P245
[3] A Survey of Graphical Page Object Detection with Deep Neural Networks
Bhatt, Jwalin
Hashmi, Khurram Azeem
Afzal, Muhammad Zeshan
Stricker, Didier
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (12):
[4] Blaschko MB, 2008, LECT NOTES COMPUT SC, V5302, P2, DOI 10.1007/978-3-540-88682-2_2
[5] The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images
Casado-Garcia, Angela
Dominguez, Cesar
Heras, Jonathan
Mata, Eloy
Pascual, Vico
[J]. DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 199 - 215
[6] Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838
[7] Chandran S., 1993, P 2 INT C DOC AN REC, P516, DOI [DOI 10.1109/ICDAR.1993.395683, 10.1109/ICDAR.1993.395683]
[8] Chen Kai, 2019, arXiv preprint arXiv:1906.07155
[9] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[10] Design of an end-to-end method to extract information from tables
Costa e Silva, Ana
Jorge, Alipio M.
Torgo, Luis
[J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) : 144 - 171

← 1 2 3 4 5 6 7 8 →