Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

被引：5

作者：

Huang, Xiaodong ^{[1
]}

机构：

[1] Capital Normal Univ, Beijing 100048, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2018年 / 77卷 / 06期

基金：

北京市自然科学基金;

关键词：

Video text; Nonsubsampled Contourlet Transform; Text detection; Text frame classification; LOCALIZATION; FEATURES; EXTRACTION; SCENE;

D O I：

10.1007/s11042-017-4619-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines.

引用

页码：7033 / 7049

页数：17

共 21 条

[1] On Detection of Multiple Object Instances Using Hough Transforms [J].

Barinova, Olga ;

Lempitsky, Victor ;

Kholi, Pushmeet .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (09) :1773-1784

[2] Dynamically Modulated Mask Sparse Tracking [J].

Chen, Zijing ;

You, Xinge ;

Zhong, Boxuan ;

Li, Jun ;

Tao, Dacheng .

IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (11) :3706-3718

[3] The nonsubsampled contourlet transform: Theory, design, and applications [J].

da Cunha, Arthur L. ;

Zhou, Jianping ;

Do, Minh N. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) :3089-3101

[4] Detection of Dynamic Background Due to Swaying Movements From Motion Features [J].

Duc-Son Pham ;

Arandjelovic, Ognjen ;

Venkatesh, Svetha .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (01) :332-344

[5] Detecting both superimposed and scene text with multiple languages and multiple alignments in video [J].

Huang, Xiaodong ;

Ma, Huadong ;

Ling, Charles X. ;

Gao, Guangyu .

MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (03) :1703-1727

[6]

Jing Zhang, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3979, DOI 10.1109/ICPR.2010.968

[7] Accurate text localization in images based on SVM output scores [J].

Jung, Cheolkon ;

Liu, Qifeng ;

Kim, Joongkyu .

IMAGE AND VISION COMPUTING, 2009, 27 (09) :1295-1301

[8] A stroke filter and its application to text localization [J].

Jung, Cheolkon ;

Liu, Qifeng ;

Kim, Joongkyu .

PATTERN RECOGNITION LETTERS, 2009, 30 (02) :114-122

[9] A New Approach for Overlay Text Detection and Extraction From Complex Video Scene [J].

Kim, Wonjun ;

Kim, Changick .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2009, 18 (02) :401-411

[10]

Lu S, 2008, INT CONF ACOUST SPEE, P1341

← 1 2 3 →