Separation of Text from Non-Text Doodles of Poet Rabindranath Tagore's Manuscripts

被引:0
|
作者
Chaudhuri, B. B. [1 ]
Borah, Samarjeet [1 ]
Saraf, Ankita [1 ]
Goyal, Alisha [1 ]
Kumari, Alka [1 ]
机构
[1] Indian Stat Inst, CVPR Unit, Kolkata 700108, India
来源
2012 NATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION SYSTEMS (NCCCS) | 2012年
关键词
Text; Non text Doodles; Rabindranath Tagore; Connected Components; pixels; Stroke Width; EXTRACTION; SEGMENTATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As gaining popularity of internet facilities have given a convenient and faster approach to mine a warehouse of both historical and contemporary handwritten documents; this has led to a continuous research and development in the field of information retrieval algorithm. In such handwritten documents, graphics and images are combined with text and often overlap one another. This paper presents a technique for separating textual data from non-textual information. The technique is based on some already published works. It is implemented in poet Rabindranath Tagore's manuscript. The approach generates connected components as basic primitive and tries to classify them as text or non-text based on a comparison between the total number of pixels and the number of boundary pixels constituting the component. A window is generated and further separation is done on the basis of the stroke width computed for each window. The paper also contains a brief review on some of the already published works.
引用
收藏
页码:165 / 169
页数:5
相关论文
共 29 条