Mining discriminative patches for script identification in natural scene images

被引:7
作者
Lu, Liqiong [1 ,2 ]
Wu, Dong [1 ]
Tang, Ziwei [2 ]
Yi, Yaohua [2 ]
Huang, Faliang [3 ]
机构
[1] Lingnan Normal Univ, Dept Informat Engn, Zhanjiang, Peoples R China
[2] Wuhan Univ, Sch Printing & Packaging, Wuhan, Peoples R China
[3] Nanning Normal Univ, Sch Comp & Informat Engn, Nanning, Peoples R China
关键词
Script identification; score CNN; attention CNN; discriminative patches; scene images; WORD;
D O I
10.3233/JIFS-200260
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.
引用
收藏
页码:551 / 563
页数:13
相关论文
共 43 条
[31]   ScriptNet: A Two Stream CNN for Script Identification in Camera-Based Document Images [J].
Deng, Minzhen ;
Ma, Hui ;
Liu, Li ;
Qiu, Taorong ;
Lu, Yue ;
Suen, Ching Y. .
NEURAL INFORMATION PROCESSING, ICONIP 2022, PT VI, 2023, 1793 :14-25
[32]   Multi-task learning for simultaneous script identification and keyword spotting in document images [J].
Cheikhrouhou, Ahmed ;
Kessentini, Yousri ;
Kanoun, Slim .
PATTERN RECOGNITION, 2021, 113
[33]   Script pattern identification of word images using multi-directional and multi-scalable textures [J].
Sahare, Parul ;
Dhok, Sanjay B. .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) :9739-9755
[34]   Script pattern identification of word images using multi-directional and multi-scalable textures [J].
Parul Sahare ;
Sanjay B. Dhok .
Journal of Ambient Intelligence and Humanized Computing, 2021, 12 :9739-9755
[35]   Word Level Script Identification of Text in Low Resolution Images of Display Boards Using Wavelet Features [J].
Angadi, S. A. ;
Kodabagi, M. M. .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, 2013, 174 :209-220
[36]   ICDAR2017 Robust Reading Challenge on Multi-lingual Scene Text Detection and Script Identification - RRC-MLT [J].
Nayef, Nibal ;
Yin, Fei ;
Bizid, Imen ;
Choi, Hyunsoo ;
Feng, Yuan ;
Karatzas, Dimosthenis ;
Luo, Zhenbo ;
Pal, Umapada ;
Rigaud, Christophe ;
Chazalon, Joseph ;
Khlif, Wafa ;
Luqman, Muhammad Muzzamil ;
Burie, Jean-Christophe ;
Liu, Cheng-Lin ;
Ogier, Jean-Marc .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1454-1459
[37]   A Fuzzy Approach for Word Level Script Identification of Text in Low Resolution Display Board Images using Wavelet Features [J].
Angadi, S. A. ;
Kodabagi, M. M. .
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, :1804-1811
[38]   Text Localization in Natural Images Through Effective Re-Identification of the MSER [J].
Mahmood, Hanaa F. ;
Li, Baihua ;
Edirisinghe, Eran .
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND MACHINE LEARNING (IML'17), 2017,
[39]   Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach [J].
Chakraborty, Neelotpal ;
Kundu, Soumyadeep ;
Paul, Sayantan ;
Mollah, Ayatullah Faruk ;
Basu, Subhadip ;
Sarkar, Ram .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (07) :7997-8008
[40]   FAS-Res2net: An Improved Res2net-Based Script Identification Method for Natural Scenes [J].
Zhang, Zhiyun ;
Mamat, Hornisa ;
Xu, Xuebin ;
Aysa, Alimjan ;
Ubul, Kurban .
APPLIED SCIENCES-BASEL, 2023, 13 (07)