Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images

被引：6

作者：

Trung Quy Phan ^{[1
]}

Shivakumara, Palaiahnakote ^{[2
]}

Bhowmick, Souvik ^{[4
]}

Li, Shimiao ^{[3
]}

Tan, Chew Lim ^{[1
]}

Pal, Umapada ^{[4
]}

机构：

[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117417, Singapore

[2] Univ Malaya, Dept Comp Syst & Informat Technol, Kuala Lumpur 50603, Malaysia

[3] Inst Infocomm Res I2R, Singapore 138632, Singapore

[4] Indian Stat Inst, Kolkata 700108, India

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2014年 / 24卷 / 08期

关键词：

Chinese video text recognition; ground truthing; multioriented video text detection and recognition; video text detection; video text recognition; PERFORMANCE EVALUATION; TRACKING; LOCALIZATION; EXTRACTION;

D O I：

10.1109/TCSVT.2014.2305515

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.

引用

页码：1277 / 1287

页数：11

共 37 条

[1]

[Anonymous], 2013, P 12 INT C DOC AN RE

[2]

[Anonymous], 2006, EVALUATION SOFTWARE

[3]

Anthimopoulos Marios, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3264, DOI 10.1109/ICPR.2010.798

[4] Video text recognition using sequential Monte Carlo and error voting methods [J].

Chen, DT ;

Odobez, JM .

PATTERN RECOGNITION LETTERS, 2005, 26 (09) :1386-1403

[5] A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods [J].

Chen, DT ;

Odobez, JM ;

Thiran, JP .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 2004, 19 (03) :205-217

[6] Text detection and recognition in images and video frames [J].

Chen, DT ;

Odobez, JM ;

Bourlard, H .

PATTERN RECOGNITION, 2004, 37 (03) :595-608

[7]

Doermann D, 2003, PROC INT CONF DOC, P606

[8]

Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

[9] An automatic performance evaluation protocol for video text detection algorithms [J].

Hua, XS ;

Liu, WY ;

Zhang, HJ .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (04) :498-507

[10] Automatic text location in images and video frames [J].

Jain, AK ;

Yu, B .

PATTERN RECOGNITION, 1998, 31 (12) :2055-2076

← 1 2 3 4 →