Video text recognition using sequential Monte Carlo and error voting methods

被引：32

作者：

Chen, DT ^{[1
]}

Odobez, JM ^{[1
]}

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Valais, Switzerland

来源：

PATTERN RECOGNITION LETTERS | 2005年 / 26卷 / 09期

关键词：

video text recognition; text segmentation; sequential Monte-Carlo filter; language model; recognition output voting error reduction;

D O I：

10.1016/j.patrec.2004.11.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses the issue of segmentation and recognition of text embedded in video sequences from their associated text image sequence extracted by a text detection module. To this end, we propose a probabilistic algorithm based on Bayesian adaptive thresholding and Monte-Carlo sampling. The algorithm approximates the posterior distribution of segmentation thresholds of text pixels in an image by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. One important contribution of the paper is to show that, thanks to the proposed methodology, the likelihood of a segmentation parameter sample can be estimated not using a classification criterion or a visual quality criterion based on the produced segmentation map, but directly from the induced text recognition result, which is directly relevant to our task. Furthermore, as a second contribution of the paper, we propose to align text recognition results from high confidence samples gathered over time, to composite a final result using error voting technique (ROVER) at the character level. Experiments are conducted on a two hour video database. Character recognition rates higher than 93%, and word error rates higher than 90% are achieved, which are 4% and 3% more than state-of-the-art methods applied to the same database. (c) 2004 Elsevier B.V. All rights reserved.

引用

页码：1386 / 1403

页数：18

共 27 条

[1]

[Anonymous], IEEE T SIGNAL PROCES

[2]

[Anonymous], P ACM INT C DIG LIB

[3] A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods [J].

Chen, DT ;

Odobez, JM ;

Thiran, JP .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 2004, 19 (03) :205-217

[4]

Chen JY, 2004, LETHAIA, V37, P3, DOI 10.1080/00241160410004764

[5] Recognising text in real scenes [J].

Clark P. ;

Mirmehdi M. .

International Journal on Document Analysis and Recognition, 2002, 4 (4) :243-257

[6]

Doucet A., 2001, SEQUENTIAL MONTE CAR

[7] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].

Fiscus, JG .

1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354

[8]

Hori O., 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), P25, DOI 10.1109/ICDAR.1999.791716

[9]

ISARD M, 1996, 4 EUR C COMP VIS, V1, P343

[10]

Kamada H., 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), P139, DOI 10.1109/ICDAR.1999.791744

← 1 2 3 →