Localization of Pashto Text in the Video Frames Using Deep Learning

被引：0

作者：

Tanveer, Syeda Freiha ^{[1
]}

Shah, Sajid ^{[1
,2
]}

Khan, Ahmad ^{[1
]}

ELAffendi, Mohammed ^{[2
]}

Ali, Gauhar ^{[2
]}

机构：

[1] COMSATS Univ, Abbottabad Campus, Islamabad, Pakistan

[2] Prince Sultan Univ, EIAS Lab, CCIS, Riyadh, Saudi Arabia

来源：

ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES | 2023年 / 4卷

关键词：

Localization; Pashto; Deep learning; YOLO; Darknet; ARTIFICIAL URDU TEXT;

D O I：

10.1007/978-3-031-21101-0_22

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object detection has remained an attractive and challenging task for the computer vision research community. A along with other objects, researchers tried to detect the texts in the images and videos as well. Earlier, the handcrafted features were used to detect text in the images and videos. These features have low discriminative power, leading to poor performance of the underlying machine learning model. Furthermore, more features are added to boost the discriminative ability of features, resulting in large data dimensionality. When dimensionality is increased, the performance of conventional machine learning usually falls. Deep learning can learn a feature by itself, which is known as representation learning, but it also performs better on high-dimensional data due to its data-hungry nature. Deep Neural network is an end-to-end system that is fully automated and does not require any handcrafting. Earlier, Arabic and Urdu and few other languages were detected in videos, but they mostly used handcrafted features to localize text in videos which shows the low performance on high dimensional data. Pashto language being the superset of Arabic, Urdu, and Persian, was remained unattended by the researchers. The contribution of this work is two folded: (i) dataset generation and annotation (ii) using a deep learning model for the Pashto text localization. Since it is pioneering work on Pashto text location, that is why comparison with the state of the art is not conducted. We obtained good results with IOU more than 80% and recall is 0.98.

引用

页码：279 / 288

页数：10

共 27 条

[1]

Agarap A F., Deep Learning using Rectified Linear Units

[2]

Ahmad R, 2016, INT CONF FRONT HAND, P453, DOI [10.1109/ICFHR.2016.0090, 10.1109/ICFHR.2016.70]

[3]

[Anonymous], 2012, P ICPR

[4]

[Anonymous], 2010, 2010 6 IRANIAN C MAC

[5] Urdu signboard detection and recognition using deep learning [J].

Arafat, Syed Yasser ;

Ashraf, Nabeel ;

Iqbal, Muhammad Javed ;

Ahmad, Iftikhar ;

Khan, Suleman ;

Rodrigues, Joel J. P. C. .

MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) :11965-11987

[6] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[7] Re duce d annotation based on deep active learning for arabic text detection in natural scene images [J].

Boukthir, Khalil ;

Qahtani, Abdulrahman M. ;

Almutiry, Omar ;

Dhahri, Habib ;

Alimi, Adel M. .

PATTERN RECOGNITION LETTERS, 2022, 157 :42-48

[8] Object Detection with Discriminatively Trained Part-Based Models [J].

Felzenszwalb, Pedro F. ;

Girshick, Ross B. ;

McAllester, David ;

Ramanan, Deva .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645

[9] Edge-based Features for Localization of Artificial Urdu Text in Video Images [J].

Jamil, Akhtar ;

Siddiqi, Imran ;

Arif, Fahim ;

Raza, Ahsen .

11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :1120-1124

[10]

Kanai S, 2018, ADV NEUR IN, V31

← 1 2 3 →