Newspaper Article Extraction Using Hierarchical Fixed Point Model

被引:8
作者
Bansal, Anukriti [1 ]
Chaudhury, Santanu [1 ]
Roy, Sumantra Dutta [1 ]
Srivastava, J. B. [2 ]
机构
[1] Indian Inst Technol Delhi, Dept Elect Engn, New Delhi, India
[2] Indian Inst Technol Delhi, Dept Math, New Delhi, India
来源
2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014) | 2014年
关键词
DOCUMENT STRUCTURE; IMAGES;
D O I
10.1109/DAS.2014.42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.
引用
收藏
页码:257 / 261
页数:5
相关论文
共 22 条
[21]   Automatic document processing: A survey [J].
Tang, YY ;
Lee, SW ;
Suen, CY .
PATTERN RECOGNITION, 1996, 29 (12) :1931-1952
[22]   Machine printed text and handwriting identification in noisy document images [J].
Zheng, YF ;
Li, HP ;
Doermann, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (03) :337-353