Logical entity recognition in multi-style document page images
被引:0
作者:
Mao, Song
论文数: 0引用数: 0
h-index: 0
机构:
US Natl Lib Med, Bethesda, MD 20894 USAUS Natl Lib Med, Bethesda, MD 20894 USA
Mao, Song
[1
]
Xu, Zheng
论文数: 0引用数: 0
h-index: 0
机构:
Univ Warwick, Sch Engn, Coventry CV4 7AL, W Midlands, EnglandUS Natl Lib Med, Bethesda, MD 20894 USA
Xu, Zheng
[2
]
Tjahjadi, Tardi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Warwick, Sch Engn, Coventry CV4 7AL, W Midlands, EnglandUS Natl Lib Med, Bethesda, MD 20894 USA
Tjahjadi, Tardi
[2
]
Thoma, George R.
论文数: 0引用数: 0
h-index: 0
机构:
US Natl Lib Med, Bethesda, MD 20894 USAUS Natl Lib Med, Bethesda, MD 20894 USA
Thoma, George R.
[1
]
机构:
[1] US Natl Lib Med, Bethesda, MD 20894 USA
[2] Univ Warwick, Sch Engn, Coventry CV4 7AL, W Midlands, England
来源:
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS
|
2006年
关键词:
D O I:
暂无
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Logical entity recognition in document page images is the essential part of a document image analysis system. A heterogeneous collection of document pages usually has many layout styles. Features extracted from same logical entities in different styles may have very different values and vice versa. Therefore, logical entity classifiers learned from a training set of multi-style document pages may not be reliable due to possible feature overlap of different logical entities in different styles. In this paper, we propose a novel method in which style information is used in both logical entity classifier training and recognition phases. In the training phase, training data are first classified into distinct styles, and a dedicated Support Vector Machine (SVM) is then learned for each style. In the recognition phase, the style of a new document page image is first identified and its logical entities are then recognized using corresponding SVM. We show in our experiments that the use of the style information significantly improves the accuracy of logical entity recognition in multi-style document page images.
引用
收藏
页码:876 / +
页数:2
相关论文
共 13 条
[11]
TATEISI Y, 1994, INT C PATT RECOG, P391, DOI 10.1109/ICPR.1994.576951
[12]
Tsujimoto S., 1990, Proceedings. 10th International Conference on Pattern Recognition (Cat. No.90CH2898-5), P551, DOI 10.1109/ICPR.1990.118163