Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引:0
|
作者
Hoai Nam Vu [1 ]
Tuan Anh Tran [1 ]
Na, In Seop [1 ]
Kim, Soo Hyung [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea
来源
2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS) | 2015年
关键词
Multilevel; K-mean; Connected Component;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 5 条
  • [1] Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
    Hoai Nam Vu
    Tuan Anh Tran
    Seop, Na In
    Kim, Soo Hyung
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2016, 4 (01) : 11 - 21
  • [2] K-Means clustering with adaptive threshold for segmentation of hand images
    Trivedi, Sheifalee
    Nandwana, Bhumika
    Khunteta, Dinesh Kumar
    Narayan, Satya
    2017 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2017, : 183 - 187
  • [3] Hand Segmentation using Modified K-Means Clustering with Depth Information and Adaptive Thresholding by Histogram Analysis
    Trivedi, Sheifalee
    Khunteta, Dinesh Kumar
    Narayan, Satya
    2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 1607 - 1609
  • [4] Segmentation of Meaningful Text-Regions from Camera Captured Document Images
    Dutta, Arpita
    Garai, Arpan
    Biswas, Samit
    PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
  • [5] Information Retrieves from Brain MRI Images for Tumor Detection Using Hybrid Technique K-means and Artificial Neural Network (KMANN)
    Sharma, Manorama
    Purohit, G. N.
    Mukherjee, Saurabh
    NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 145 - 157