Text and non-text separation in offline document images: a survey

被引:0
作者
Showmik Bhowmik
Ram Sarkar
Mita Nasipuri
David Doermann
机构
[1] Jadavpur University,Institute for Advanced Computer Studies
[2] University of Maryland,undefined
来源
International Journal on Document Analysis and Recognition (IJDAR) | 2018年 / 21卷
关键词
Text/non-text separation; Segmentation; Offline document images; Engineering drawing; Map; Unconstrained handwritten document; Newspaper; Journal; Magazine; Check; Form; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain.
引用
收藏
页码:1 / 20
页数:19
相关论文
共 50 条
  • [21] Contextual text/non-text stroke classification in online handwritten notes with conditional random fields
    Delaye, Adrien
    Liu, Cheng-Lin
    [J]. PATTERN RECOGNITION, 2014, 47 (03) : 959 - 968
  • [22] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [23] Natural scene text detection by multi-scale adaptive color clustering and non-text filtering
    Wu, Hui
    Zou, Beiji
    Zhao, Yu-Qian
    Chen, Zailiang
    Zhu, Chengzhang
    Guo, Jianjing
    [J]. NEUROCOMPUTING, 2016, 214 : 1011 - 1025
  • [24] Improving Online Handwriting Text/Non-text Classification Accuracy Under Condition of Stroke Context Absence
    Polotskyi, Serhii
    deriuga, IVan
    Ignatova, Tetiana
    Melnyk, Volodymyr
    Azarov, Hennadii
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT I, 2019, 11506 : 210 - 221
  • [25] A Survey on Text Information Extraction from Born-Digital and Scene Text Images
    Joan, S. P. Faustina
    Valli, S.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2019, 89 (01) : 77 - 101
  • [26] Novel Approach to Background-Text-Nontext Separation in Ancient Degraded Document Images
    Asatryan, David
    Sazhumyan, Grigor
    Aznauryan, Lusine
    [J]. 2017 ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES (CSIT), 2017, : 154 - 157
  • [27] Offline-printed Sindhi Optical Text Recognition: Survey
    Solangi, Yasir Ali
    Solangi, Zulfiqar Ali
    Raza, Ali
    Shaikh, Noor Ahmed
    Mallah, Ghulam Ali
    Shah, Asadullah
    [J]. 2018 5TH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING TECHNOLOGIES AND APPLIED SCIENCES (IEEE ICETAS), 2018,
  • [28] A Survey on Text Information Extraction from Born-Digital and Scene Text Images
    S. P. Faustina Joan
    S. Valli
    [J]. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 2019, 89 : 77 - 101
  • [29] A Hybrid Method for Text Line Extraction in Handwritten Document Images
    Kiumarsi, Ehsan
    Alaei, Alireza
    [J]. PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 241 - 246
  • [30] Page Level Input for Handwritten Text Recognition in Document Images
    Kumari, Lalita
    Singh, Sukhdeep
    Sharma, Anuj
    [J]. PROCEEDINGS OF 7TH INTERNATIONAL CONFERENCE ON HARMONY SEARCH, SOFT COMPUTING AND APPLICATIONS (ICHSA 2022), 2022, 140 : 171 - 183