Residual attention-based multi-scale script identification in scene text images

被引:0
作者
Ma, Mengkai [1 ]
Wang, Qiu-Feng [1 ]
Huang, Shan [2 ]
Huang, Shen [2 ]
Goulermas, Yannis [3 ]
Huang, Kaizhu [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Dept Intelligent Sci, Suzhou, Peoples R China
[2] Tencent Technol Co Ltd, Beijing, Peoples R China
[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
基金
中国国家自然科学基金;
关键词
Script identification; Attention mechanism; Multi-scale features; Feature fusion; Global max pooling; NEURAL-NETWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively. (c) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:222 / 233
页数:12
相关论文
共 53 条
  • [1] [Anonymous], 2019, ADV NEURAL INFORM PR, DOI DOI 10.1109/EMBC.2019.8856774
  • [2] [Anonymous], 2017, P IEEE C COMPUTER VI
  • [3] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
    Baek, Jeonghun
    Kim, Geewook
    Lee, Junyeop
    Park, Sungrae
    Han, Dongyoon
    Yun, Sangdoo
    Oh, Seong Joon
    Lee, Hwalsuk
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4714 - 4722
  • [4] Bahdanau D., 2014, ABS14090473 CORR
  • [5] Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network
    Bhunia, Ankan Kumar
    Konwer, Aishik
    Bhunia, Ayan Kumar
    Bhowmick, Abir
    Roy, Partha P.
    Pal, Umapada
    [J]. PATTERN RECOGNITION, 2019, 85 : 172 - 184
  • [6] Application of X-RAY Digital Imaging Technology in Hardware Quality Test of Transmission Line
    Bi, Xiaotian
    Chen, Dabing
    Gao, Song
    Chen, Jie
    Sun, Lei
    Jia, Jun
    Wang, Yongwei
    Fan, Wang
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INTERNATIONAL ELECTRICAL AND ENERGY CONFERENCE (CIEEC), 2019, : 1 - 5
  • [7] Busta M., 2018, ACCV, P127
  • [8] Cao Y., 2019, J Inform Process Syst, V16, P67
  • [9] Changxu Cheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1077, DOI 10.1109/ICDAR.2019.00175
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848