A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos

被引：49

作者：

Tian, Shu ^{[1
]}

Yin, Xu-Cheng ^{[2
]}

Su, Ya ^{[1
]}

Hao, Hong-Wei ^{[3
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Dept Comp Sci & Technol, Beijing 100083, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Dept Comp Sci & Technol, Beijing Key Lab Mat Sci Knowledge Engn, Beijing 100083, Peoples R China

[3] Chinese Acad Sci, Res Ctr Digital Technol, Inst Automat, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2018年 / 40卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Video text extraction; text tracking; tracking based text detection; tracking based text recognition; embedded captions; READING TEXT; DCT FEATURE; IMAGES; SEGMENTATION; EXTRACTION;

D O I：

10.1109/TPAMI.2017.2692763

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video text extraction plays an important role for multimedia understanding and retrieval. Most previous research efforts are conducted within individual frames. A few of recent methods, which pay attention to text tracking using multiple frames, however, do not effectively mine the relations among text detection, tracking and recognition. In this paper, we propose a generic Bayesian-based framework of Tracking based Text Detection And Recognition (T(2)DAR) from web videos for embedded captions, which is composed of three major components, i.e., text tracking, tracking based text detection, and tracking based text recognition. In this unified framework, text tracking is first conducted by tracking-by-detection. Tracking trajectories are then revised and refined with detection or recognition results. Text detection or recognition is finally improved with multi-frame integration. Moreover, a challenging video text (embedded caption text) database (USTB-VidTEXT) is constructed and publicly available. A variety of experiments on this dataset verify that our proposed approach largely improves the performance of text detection and recognition from web videos.

引用

页码：542 / 554

页数：13

共 54 条

[1]

[Anonymous], 2001, INT C MULT EXP

[2]

[Anonymous], P 19 INT C PATT REC

[3]

[Anonymous], 2014, ICME

[4]

[Anonymous], 1990, Introduction to statistical pattern recognition

[5]

[Anonymous], 2011, P 1 ACM INT C MULT R

[6] Text detection and recognition in images and video frames [J].

Chen, DT ;

Odobez, JM ;

Bourlard, H .

PATTERN RECOGNITION, 2004, 37 (03) :595-608

[7]

Chen XR, 2004, PROC CVPR IEEE, P366

[8]

Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

[9] MSER-based Real-Time Text Detection and Tracking [J].

Gomez, Lluis ;

Karatzas, Dimosthenis .

2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :3110-3115

[10]

Goto Hideaki, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P141, DOI 10.1109/ICDAR.2009.102

← 1 2 3 4 5 6 →