Video text recognition using sequential Monte Carlo and error voting methods

被引:32
作者
Chen, DT [1 ]
Odobez, JM [1 ]
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Valais, Switzerland
关键词
video text recognition; text segmentation; sequential Monte-Carlo filter; language model; recognition output voting error reduction;
D O I
10.1016/j.patrec.2004.11.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the issue of segmentation and recognition of text embedded in video sequences from their associated text image sequence extracted by a text detection module. To this end, we propose a probabilistic algorithm based on Bayesian adaptive thresholding and Monte-Carlo sampling. The algorithm approximates the posterior distribution of segmentation thresholds of text pixels in an image by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. One important contribution of the paper is to show that, thanks to the proposed methodology, the likelihood of a segmentation parameter sample can be estimated not using a classification criterion or a visual quality criterion based on the produced segmentation map, but directly from the induced text recognition result, which is directly relevant to our task. Furthermore, as a second contribution of the paper, we propose to align text recognition results from high confidence samples gathered over time, to composite a final result using error voting technique (ROVER) at the character level. Experiments are conducted on a two hour video database. Character recognition rates higher than 93%, and word error rates higher than 90% are achieved, which are 4% and 3% more than state-of-the-art methods applied to the same database. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1386 / 1403
页数:18
相关论文
共 27 条
[1]  
[Anonymous], IEEE T SIGNAL PROCES
[2]  
[Anonymous], P ACM INT C DIG LIB
[3]   A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods [J].
Chen, DT ;
Odobez, JM ;
Thiran, JP .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2004, 19 (03) :205-217
[4]  
Chen JY, 2004, LETHAIA, V37, P3, DOI 10.1080/00241160410004764
[5]   Recognising text in real scenes [J].
Clark P. ;
Mirmehdi M. .
International Journal on Document Analysis and Recognition, 2002, 4 (4) :243-257
[6]  
Doucet A., 2001, SEQUENTIAL MONTE CAR
[7]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354
[8]  
Hori O., 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), P25, DOI 10.1109/ICDAR.1999.791716
[9]  
ISARD M, 1996, 4 EUR C COMP VIS, V1, P343
[10]  
Kamada H., 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), P139, DOI 10.1109/ICDAR.1999.791744