Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images

被引：43

作者：

Liang, Guozhu ^{[1
]}

Shivakumara, Palaiahnakote ^{[2
]}

Lu, Tong ^{[1
]}

Tan, Chew Lim ^{[3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China

[2] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia

[3] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2015年 / 24卷 / 11期

基金：

美国国家科学基金会;

关键词：

Laplacian-wavelet; multi spectral fusion; maxima stable extreme regions; stroke width transform; arbitrarily oriented video text detection; EXTRACTION; GRADIENT; RECOGNITION; SCHEME;

D O I：

10.1109/TIP.2015.2465169

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text detection from video as well as natural scene images is challenging due to the variations in background, contrast, text type, font type, font size, and so on. Besides, arbitrary orientations of texts with multi-scripts add more complexity to the problem. The proposed approach introduces a new idea of convolving Laplacian with wavelet sub-bands at different levels in the frequency domain for enhancing low resolution text pixels. Then, the results obtained from different sub-bands (spectral) are fused for detecting candidate text pixels. We explore maxima stable extreme regions along with stroke width transform for detecting candidate text regions. Text alignment is done based on the distance between the nearest neighbor clusters of candidate text regions. In addition, the approach presents a new symmetry driven nearest neighbor for restoring full text lines. We conduct experiments on our collected video data as well as several benchmark data sets, such as ICDAR 2011, ICDAR 2013, and MSRA-TD500 to evaluate the proposed method. The proposed approach is compared with the state-of-the-art methods to show its superiority to the existing methods.

引用

页码：4488 / 4501

页数：14

共 39 条

[1]

Abbas M. A., 2012, 2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), P1192, DOI 10.1109/ISSPA.2012.6310472

[2]

[Anonymous], 2014, ICME

[3]

[Anonymous], 2014, IEEE T IMAGE PROCESS, DOI DOI 10.1109/TIP.2014.2353813

[4] Video text recognition using sequential Monte Carlo and error voting methods [J].

Chen, DT ;

Odobez, JM .

PATTERN RECOGNITION LETTERS, 2005, 26 (09) :1386-1403

[5]

Chen H., 2011, 2011 18th IEEE International Conference on Image Processing (ICIP 2011), P2609, DOI 10.1109/ICIP.2011.6116200

[6]

Chen XR, 2004, PROC CVPR IEEE, P366

[7] Extraction of special effects caption text events from digital video [J].

David Crandall ;

Sameer Antani ;

Rangachar Kasturi .

International Journal on Document Analysis and Recognition, 2003, 5 (2) :138-157

[8]

Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

[9] MSER-based Real-Time Text Detection and Tracking [J].

Gomez, Lluis ;

Karatzas, Dimosthenis .

2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :3110-3115

[10] An automatic performance evaluation protocol for video text detection algorithms [J].

Hua, XS ;

Liu, WY ;

Zhang, HJ .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (04) :498-507

← 1 2 3 4 →