DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images

被引:247
作者
Schreiber, Sebastian [1 ,3 ]
Agne, Stefan [3 ]
Wolf, Ivo [1 ]
Dengel, Andreas [2 ,3 ]
Ahmed, Sheraz [3 ]
机构
[1] Mannheim Univ Appl Sci, Mannheim, Germany
[2] Kaiserslautern Univ Technol, Kaiserslautern, Germany
[3] German Res Ctr Artificial Intelligence DFKI, Kaiserslautern, Germany
来源
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年
关键词
D O I
10.1109/ICDAR.2017.192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel end-to-end system for table understanding in document images called DeepDeSRT. In particular, the contribution of DeepDeSRT is two-fold. First, it presents a deep learning-based solution for table detection in document images. Secondly, it proposes a novel deep learning-based approach for table structure recognition, i.e. identifying rows, columns, and cell positions in the detected tables. In contrast to existing rule-based methods, which rely on heuristics or additional PDF metadata (like, for example, print instructions, character bounding boxes, or line segments), the presented system is data-driven and does not need any heuristics or metadata to detect as well as to recognize tabular structures in document images. Furthermore, in contrast to most existing table detection and structure recognition methods, which are applicable only to PDFs, DeepDeSRT processes document images, which makes it equally suitable for born-digital PDFs (as they can automatically be converted into images) as well as even harder problems, e.g. scanned documents. To gauge the performance of DeepDeSRT, the system is evaluated on the publicly available ICDAR 2013 table competition dataset containing 67 documents with 238 pages overall. Evaluation results reveal that DeepDeSRT outperforms state-of-the-art methods for table detection and structure recognition and achieves F1-measures of 96.77% and 91.44% for table detection and structure recognition, respectively. Additionally, DeepDeSRT is evaluated on a closed dataset from a real use case of a major European aviation company comprising documents which are highly unlike those in ICDAR 2013. Tested on a randomly selected sample from this dataset, DeepDeSRT achieves high detection accuracy for tables which demonstrates the sound generalization capabilities of our system.
引用
收藏
页码:1162 / 1167
页数:6
相关论文
共 37 条
[1]  
[Anonymous], 2016, PARSENET LOOKING WID
[2]  
[Anonymous], 2015, 2015 IEEE INT C COMP
[3]  
[Anonymous], 2013, ICDAR 2013 12 INT C
[4]  
[Anonymous], 2016, 2016 IEEE C COMP VIS
[5]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[6]  
Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838
[7]   Design of an end-to-end method to extract information from tables [J].
Costa e Silva, Ana ;
Jorge, Alipio M. ;
Torgo, Luis .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) :144-171
[8]  
Couasnon Bertrand, 2014, Handbook of Document Image Processing and Recognition, P647
[9]  
e Silva Ana Costa, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P843, DOI 10.1109/ICDAR.2009.185
[10]   Table-processing paradigms: a research survey [J].
Embley, David W. ;
Hurst, Matthew ;
Lopresti, Daniel ;
Nagy, George .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) :66-86