DPF-S2S: A novel dual-pathway-fusion-based sequence-to-sequence text recognition model

被引:10
作者
Zhang, Yuqing [1 ]
Wu, Peishu [1 ]
Li, Han [1 ]
Liu, Yurong [2 ]
Alsaadi, Fuad E. [3 ]
Zeng, Nianyin [1 ]
机构
[1] Xiamen Univ, Dept Instrumental & Elect Engn, Fujian 361005, Peoples R China
[2] Yangzhou Univ, Dept Math, Yangzhou 225002, Peoples R China
[3] King Abdulaziz Univ, Fac Engn, Dept Elect & Comp Engn, Commun Syst & Networks Res Grp, Jeddah 21589, Saudi Arabia
关键词
Text recognition; Double alignment; Fusion operations; Attention maps; CONVOLUTIONAL NEURAL-NETWORK; SCENE;
D O I
10.1016/j.neucom.2022.12.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel dual-pathway-fusion-based sequence-to-sequence learning model (DPF-S2S) is pro-posed for text recognition in the wild, which mainly focuses on enriching the spatial information and extracting high-dimensional representation features to assist decoding. In particular, a double alignment module is developed to solve the problem of text misalignment, where both position and vision informa-tion are well considered. Moreover, a global fusion module is deployed to enrich 2D information in the aligned attention maps, which benefits accurate recognition from complicated scenes with arbitrary text shapes and poor imaging conditions. Benchmark evaluations on seven datasets have demonstrated the superiority of proposed DPF-S2S model in comparison to other state-of-the-art text recognition methods, which presents great competitiveness on identifying texts in both regular and irregular scenes. In addi-tion, extensive ablation studies have been carried out, which validate the effectiveness of applied strate-gies in proposed DPF-S2S.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:182 / 190
页数:9
相关论文
共 47 条
[1]   Recurrent Convolutional Neural Network MSER-Based Approach for Payable Document Processing [J].
Aladhadh, Suliman ;
Rehman, Hidayat Ur ;
Qamar, Ali Mustafa ;
Khan, Rehan Ullah .
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (03) :3399-3411
[2]  
Bahdanau D., 2015, INT C LEARN REPR ICL
[3]   Edit Probability for Scene Text Recognition [J].
Bai, Fan ;
Cheng, Zhanzhan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1508-1516
[4]   ATTICA: A Dataset for Arabic Text-Based Traffic Panels Detection [J].
Boujemaa, Kaoutar Sefrioui ;
Akallouch, Mohammed ;
Berrada, Ismail ;
Fardousse, Khalid ;
Bouhoute, Afaf .
IEEE ACCESS, 2021, 9 :93937-93947
[5]   AON: Towards Arbitrarily-Oriented Text Recognition [J].
Cheng, Zhanzhan ;
Xu, Yangliu ;
Bai, Fan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5571-5579
[6]   Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].
Cheng, Zhanzhan ;
Bai, Fan ;
Xu, Yunlu ;
Zheng, Gang ;
Pu, Shiliang ;
Zhou, Shuigeng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094
[7]   Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [J].
Fang, Shancheng ;
Xie, Hongtao ;
Wang, Yuxin ;
Mao, Zhendong ;
Zhang, Yongdong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7094-7103
[8]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   A survey on state estimation of complex dynamical networks [J].
Hu, Jun ;
Jia, Chaoqing ;
Liu, Hongjian ;
Yi, Xiaojian ;
Liu, Yurong .
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (16) :3351-3367