Text and non-text separation in offline document images: a survey

被引:36
作者
Bhowmik, Showmik [1 ]
Sarkar, Ram [1 ]
Nasipuri, Mita [1 ]
Doermann, David [2 ]
机构
[1] Jadavpur Univ, Kolkata, India
[2] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
关键词
Text/non-text separation; Segmentation; Offline document images; Engineering drawing; Map; Unconstrained handwritten document; Newspaper; Journal; Magazine; Check; Form; Survey; AMOUNT RECOGNITION; LAYOUT ANALYSIS; AUTOMATIC EXTRACTION; BLOCK SEGMENTATION; PAGE SEGMENTATION; SYSTEM; CLASSIFICATION; LINES; VIDEO; REPRESENTATION;
D O I
10.1007/s10032-018-0296-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 145 条
[1]   RECOGNITION OF ENGINEERING DRAWING ENTITIES: REVIEW OF APPROACHES [J].
Ablameyko, Sergey V. ;
Uchida, Seiichi .
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2007, 7 (04) :709-733
[2]  
Adak C., 2015, TENCON 2015 2015 IEE, P1
[3]  
Agrawal Mudit, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P1011, DOI 10.1109/ICDAR.2009.270
[4]  
AhSoon C, 1997, PROC INT CONF DOC, P347, DOI 10.1109/ICDAR.1997.619869
[5]   AUTOMATED ENTRY SYSTEM FOR PRINTED DOCUMENTS [J].
AKIYAMA, T ;
HAGITA, N .
PATTERN RECOGNITION, 1990, 23 (11) :1141-1154
[6]  
[Anonymous], 2006, REMOTE SENSING PIXEL
[7]  
[Anonymous], P ICCPOL TOK JAP 24
[8]  
[Anonymous], 8 INT C SOFT COMP PA
[9]  
Antonacopoulos Apostolos, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P296, DOI 10.1109/ICDAR.2009.271
[10]  
Antonacopoulos A., 1995, Proceedings of the Third International Conference on Document Analysis and Recognition, P1132, DOI 10.1109/ICDAR.1995.602119