Residual attention-based multi-scale script identification in scene text images

被引：0

作者：

Ma, Mengkai ^{[1
]}

Wang, Qiu-Feng ^{[1
]}

Huang, Shan ^{[2
]}

Huang, Shen ^{[2
]}

Goulermas, Yannis ^{[3
]}

Huang, Kaizhu ^{[1
]}

机构：

[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Dept Intelligent Sci, Suzhou, Peoples R China

[2] Tencent Technol Co Ltd, Beijing, Peoples R China

[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England

来源：

NEUROCOMPUTING | 2021年 / 421卷

基金：

中国国家自然科学基金;

关键词：

Script identification; Attention mechanism; Multi-scale features; Feature fusion; Global max pooling; NEURAL-NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively. (c) 2020 Elsevier B.V. All rights reserved.

引用

页码：222 / 233

页数：12

共 53 条

[1] [Anonymous], 2019, ADV NEURAL INFORM PR, DOI DOI 10.1109/EMBC.2019.8856774
[2] [Anonymous], 2017, P IEEE C COMPUTER VI
[3] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Baek, Jeonghun
Kim, Geewook
Lee, Junyeop
Park, Sungrae
Han, Dongyoon
Yun, Sangdoo
Oh, Seong Joon
Lee, Hwalsuk
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4714 - 4722
[4] Bahdanau D., 2014, ABS14090473 CORR
[5] Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network
Bhunia, Ankan Kumar
Konwer, Aishik
Bhunia, Ayan Kumar
Bhowmick, Abir
Roy, Partha P.
Pal, Umapada
[J]. PATTERN RECOGNITION, 2019, 85 : 172 - 184
[6] Application of X-RAY Digital Imaging Technology in Hardware Quality Test of Transmission Line
Bi, Xiaotian
Chen, Dabing
Gao, Song
Chen, Jie
Sun, Lei
Jia, Jun
Wang, Yongwei
Fan, Wang
[J]. PROCEEDINGS OF 2019 IEEE 3RD INTERNATIONAL ELECTRICAL AND ENERGY CONFERENCE (CIEEC), 2019, : 1 - 5
[7] Busta M., 2018, ACCV, P127
[8] Cao Y., 2019, J Inform Process Syst, V16, P67
[9] Changxu Cheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1077, DOI 10.1109/ICDAR.2019.00175
[10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 →