Detection and recognition of cursive text from video frames

被引:7
作者
Mirza, Ali [1 ]
Zeshan, Ossama [1 ]
Atif, Muhammad [1 ]
Siddiqi, Imran [1 ]
机构
[1] Bahria Univ, Islamabad, Pakistan
关键词
Text detection; Text recognition; Script identification; Deep neural networks (DNNs); Convolutional neural networks (CNNs); Long short-term memory (LSTM) networks; Caption text; Cursive text; ARTIFICIAL URDU TEXT; NATURAL SCENE IMAGE; SCRIPT IDENTIFICATION; NEURAL-NETWORK; HYBRID APPROACH; LOCALIZATION; FEATURES; REPRESENTATION; SEGMENTATION; EXTRACTION;
D O I
10.1186/s13640-020-00523-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short- term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.
引用
收藏
页数:19
相关论文
共 50 条
[31]   Detecting text in video frames [J].
Anthimopoulos, M. ;
Gatos, B. ;
Pratikakis, I. ;
Perantonis, S. J. .
PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PATTERN RECOGNITION, AND APPLICATIONS, 2007, :39-+
[32]   Cursive Stroke Sequencing for Handwritten Text Documents Recognition [J].
Panwar, Subhash ;
Nain, Neeta .
2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
[33]   A Video Text Detection and Tracking System [J].
Yusufu, Tuoerhongjiang ;
Wang, Yiqing ;
Fang, Xiangzhong .
2013 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2013, :522-529
[34]   A Comprehensive Method for Arabic Video Text Detection, Localization, Extraction and Recognition [J].
Ben Halima, M. ;
Karray, H. ;
Alimi, A. M. .
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT II, 2010, 6298 :648-659
[35]   Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images [J].
Trung Quy Phan ;
Shivakumara, Palaiahnakote ;
Bhowmick, Souvik ;
Li, Shimiao ;
Tan, Chew Lim ;
Pal, Umapada .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (08) :1277-1287
[36]   A two-stage scheme for text detection in video images [J].
Anthimopoulos, Marios ;
Gatos, Basilis ;
Pratikakis, Ioannis .
IMAGE AND VISION COMPUTING, 2010, 28 (09) :1413-1426
[37]   A Video Text Detection Method Based on Key Text Points [J].
Li, Zhi ;
Liu, Guizhong ;
Qian, Xueming ;
Wang, Chen ;
Ma, Yana ;
Yang, Yang .
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT I, 2010, 6297 :284-295
[38]   Hybrid approach for Farsi/Arabic text detection and localisation in video frames [J].
Moradi, Mohieddin ;
Mozaffari, Saeed .
IET IMAGE PROCESSING, 2013, 7 (02) :154-164
[39]   Rotation and script independent text detection from video frames using sub pixel mapping [J].
Mittal, Anshul ;
Roy, Partha Pratim ;
Singh, Priyanka ;
Raman, Balasubramanian .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 46 :187-198
[40]   Video text detection and segmentation for optical character recognition [J].
Chong-Wah Ngo ;
Chi-Kwong Chan .
Multimedia Systems, 2005, 10 :261-272