Multi-task learning for simultaneous script identification and keyword spotting in document images

被引:14
作者
Cheikhrouhou, Ahmed [1 ,2 ]
Kessentini, Yousri [1 ,3 ]
Kanoun, Slim [2 ]
机构
[1] Digital Res Ctr Sfax, Sfax, Tunisia
[2] Univ Sfax, MIRACL Lab, Sfax, Tunisia
[3] SM RTS Lab Signals Syst aRtificial Intelligence &, Sfax, Tunisia
关键词
CBP; CTC; Keyword spotting; Script identification; Handwritten; RECOGNITION;
D O I
10.1016/j.patcog.2021.107832
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, an end-to-end multi-task deep neural network was proposed for simultaneous script identification and Keyword Spotting (KWS) in multi-lingual hand-written and printed document images. We introduced a unified approach which addresses both challenges cohesively, by designing a novel CNNBLSTM architecture. The script identification stage involves local and global features extraction to allow the network to cover more relevant information. Contrarily to the traditional feature fusion approaches which build a linear feature concatenation, we employed a compact bi-linear pooling to capture pairwise correlations between these features. The script identification result is, then, injected in the KWS module to eliminate characters of irrelevant scripts and perform the decoding stage using a single-script mode. All the network parameters were trained in an end-to-end fashion using a multi-task learning that jointly minimizes the NLL loss for the script identification and the CTC loss for the KWS. Our approach was evaluated on a variety of public datasets of different languages and writing types.. Experiments proved the efficacy of our deep multi-task representation learning compared to the state-of-the-art systems for both of keyword spotting and script identification tasks. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
[1]   Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models [J].
Ahmad, Irfan ;
Mahmoud, Sabri A. ;
Fink, Gernot A. .
PATTERN RECOGNITION, 2016, 51 :97-111
[2]   Word Spotting and Recognition with Embedded Attributes [J].
Almazan, Jon ;
Gordo, Albert ;
Fornes, Alicia ;
Valveny, Ernest .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) :2552-2566
[3]  
Sanchez JA, 2016, INT CONF FRONT HAND, P630, DOI [10.1109/ICFHR.2016.0120, 10.1109/ICFHR.2016.112]
[4]   Handwriting Recognition in Low-resource Scripts using Adversarial Learning [J].
Bhunia, Ayan Kumar ;
Das, Abhirup ;
Bhunia, Ankan Kumar ;
Kishore, Perla Sai Raj ;
Roy, Partha Pratim .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4762-4771
[5]  
Bideault Gautier, 2015, 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015). Proceedings, P5
[6]  
Bin Ahmed S, 2017, INT ARAB J INF TECHN, V14, P239
[7]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[8]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[9]   Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage [J].
Cheikhrouhou, Ahmed ;
Kessentini, Yousri ;
Kanoun, Slim .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13) :9201-9215
[10]   MuLTReNets: Multilingual text recognition networks for simultaneous script identification and handwriting recognition [J].
Chen, Zhuo ;
Yin, Fei ;
Zhang, Xu-Yao ;
Yang, Qing ;
Liu, Cheng-Lin .
PATTERN RECOGNITION, 2020, 108