Bag-of-Visual Words for Word-Wise Video Script Identification: A Study

被引：6

作者：

Sharma, Nabin ^{[1
]}

Mandal, Ranju ^{[1
]}

Sharma, Rabi ^{[2
]}

Pal, Umapada ^{[2
]}

Blumenstein, Michael ^{[3
]}

机构：

[1] Griffith Univ, Sch ICT, Nathan, Qld 4222, Australia

[2] Indian Stat Inst, CVPR Unit, Kolkata 700108, India

[3] Griffith Univ, Sch ICT, Nathan, Qld 4222, Australia

来源：

2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2015年

关键词：

SCALE;

D O I：

10.1109/IJCNN.2015.7280631

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Use of multiple scripts for information communication through various media is quite common in a multilingual country. Optical character recognition of such document images or videos assists in indexing them for effective information retrieval. Hence, script identification from multi-lingual documents/images is a necessary step for selecting the appropriate OCR, due the absence of a single OCR system capable of handling multiple scripts. Script identification from printed as well as handwritten documents is a well-researched area, but script identification from video frames has not been explored much. Low resolution, blur, noisy background, to mention a few are the major bottle necks when processing video frames, and makes script identification from video images a challenging task. This paper examines the potential of Bag-of-Visual Words based techniques for word-wise script identification from video frames. Two different approaches namely, Bag-Of-Features (BoF) and Spatial Pyramid Matching (SPM), using patch based SIFT descriptors were considered for the current study. SVM Classifier was used for analysing the three popular south Indian scripts, namely Tamil, Telugu and Kannada in combination with English and Hindi. A comparative study of Bag-of-Visual words with traditional script identification techniques involving gradient based features (e. g. HoG) and texture based features (e. g. LBP) is presented. Experimental results shows that patch-based features along with SPM outperformed the traditional techniques and promising accuracies were achieved on 2534 words from the five scripts. The study reveals that patch-based feature can be used for scripts identification in-order to overcome the inherent problems with video frames.

引用

页数：7

共 24 条

[1]

[Anonymous], 2012, 2012 INT C DIG IM CO

[2] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

[3]

Chanda Sukalpa, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P926, DOI 10.1109/ICDAR.2009.239

[4] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[5]

Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482

[6] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[7] Script Recognition-A Review [J].

Ghosh, Debashis ;

Dube, Tulika ;

Shivaprasad, Adamane P. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (12) :2142-2161

[8]

Huanfeng Ma, 2003, Proceedings of the SPIE - The International Society for Optical Engineering, V5296, P124, DOI 10.1117/12.530538

[9] Identifying script on word-level with informational confidence [J].

Jaeger, S ;

Ma, HF ;

Doermann, D .

EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, :416-420

[10] Text information extraction in images and video: a survey [J].

Jung, K ;

Kim, KI ;

Jain, AK .

PATTERN RECOGNITION, 2004, 37 (05) :977-997

← 1 2 3 →