CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

被引:19
作者
Hashmi, Khurram Azeem [1 ,2 ,3 ]
Pagani, Alain [3 ]
Liwicki, Marcus [4 ]
Stricker, Didier [1 ,3 ]
Afzal, Muhammad Zeshan [1 ,2 ,3 ]
机构
[1] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany
[2] Tech Univ Kaiserslautern, Mindgarage, D-67663 Kaiserslautern, Germany
[3] German Res Inst Artificial Intelligence DFKI, D-67663 Kaiserslautern, Germany
[4] Lulea Univ Technol, Dept Comp Sci, S-97187 Lulea, Sweden
关键词
table detection; table recognition; cascade Mask R-CNN; atrous convolution; recursive feature pyramid networks; document image analysis; deep neural networks; computer vision; object detection; RECOGNITION;
D O I
10.3390/jimaging7100214
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.
引用
收藏
页数:23
相关论文
共 74 条
  • [1] Agarwal M., 2020, ARXIV200810831
  • [2] Arif S, 2018, 2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), P245
  • [3] A Survey of Graphical Page Object Detection with Deep Neural Networks
    Bhatt, Jwalin
    Hashmi, Khurram Azeem
    Afzal, Muhammad Zeshan
    Stricker, Didier
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [4] Blaschko MB, 2008, LECT NOTES COMPUT SC, V5302, P2, DOI 10.1007/978-3-540-88682-2_2
  • [5] The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images
    Casado-Garcia, Angela
    Dominguez, Cesar
    Heras, Jonathan
    Mata, Eloy
    Pascual, Vico
    [J]. DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 199 - 215
  • [6] Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838
  • [7] Chandran S., 1993, P 2 INT C DOC AN REC, P516, DOI [DOI 10.1109/ICDAR.1993.395683, 10.1109/ICDAR.1993.395683]
  • [8] Chen Kai, 2019, arXiv preprint arXiv:1906.07155
  • [9] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [10] Design of an end-to-end method to extract information from tables
    Costa e Silva, Ana
    Jorge, Alipio M.
    Torgo, Luis
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) : 144 - 171