Scene text spotting based on end-to-end

被引:0
|
作者
Wei G. [1 ,2 ]
Rong W. [1 ]
Liang Y. [1 ]
Xiao X. [1 ]
Liu X. [1 ]
机构
[1] College of Computer Science and Engineering, Shandong University of Science and Technology, Shandong, Qingdao
[2] College of Intelligent Equipment, Shandong University of Science and Technology, Shandong, Taian
关键词
End-to-end; Joint optimization; SAM-BiLSTM; Scene text spotting; TCM;
D O I
10.3233/JIFS-200903
中图分类号
TN911 [通信理论];
学科分类号
081002 ;
摘要
Aiming at the problem that the traditional OCR processing method ignores the inherent connection between the text detection task and the text recognition task, This paper propose a novel end-to-end text spotting framework. The framework includes three parts: shared convolutional feature network, text detector and text recognizer. By sharing convolutional feature network, the text detection network and the text recognition network can be jointly optimized at the same time. On the one hand, it can reduce the computational burden; on the other hand, it can effectively use the inherent connection between text detection and text recognition. This model add the TCM (Text Context Module) on the basis of Mask RCNN, which can effectively solve the negative sample problem in text detection tasks. This paper propose a text recognition model based on the SAM-BiLSTM (spatial attention mechanism with BiLSTM), which can more effectively extract the semantic information between characters. This model significantly surpasses state-of-the-art methods on a number of text detection and text spotting benchmarks, including ICDAR 2015, Total-Text. © 2021 - IOS Press. All rights reserved.
引用
收藏
页码:8871 / 8881
页数:10
相关论文
共 50 条
  • [21] AN END-TO-END FAR-FIELD KEYWORD SPOTTING SYSTEM WITH NEURAL BEAMFORMING
    Ji, Xuan
    Lu, Lu
    Fang, Fuming
    Ma, Jianbo
    Zhu, Lei
    Li, Jinke
    Zhao, Dongdi
    Liu, Ming
    Jiang, Feijun
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 892 - 899
  • [22] ADVERSARIAL EXAMPLES FOR IMPROVING END-TO-END ATTENTION-BASED SMALL-FOOTPRINT KEYWORD SPOTTING
    Wang, Xiong
    Sun, Sining
    Shan, Changhao
    Hou, Jingyong
    Xie, Lei
    Li, Shen
    Lei, Xin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6366 - 6370
  • [23] DiZNet: An end-to-end text detection and recognition algorithm with detail in text zone
    Zhou, Di
    Zhang, Jianxun
    Li, Chao
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [24] RTNet: An End-to-End Method for Handwritten Text Image Translation
    Su, Tonghua
    Liu, Shuchen
    Zhou, Shengjie
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 99 - 113
  • [25] FREE: A Fast and Robust End-to-End Video Text Spotter
    Cheng, Zhanzhan
    Lu, Jing
    Zou, Baorui
    Qiao, Liang
    Xu, Yunlu
    Pu, Shiliang
    Niu, Yi
    Wu, Fei
    Zhou, Shuigeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 822 - 837
  • [26] End-to-End Chinese Image Text Recognition with Attention Model
    Sheng, Fenfen
    Zhai, Chuanlei
    Chen, Zhineng
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 180 - 189
  • [27] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    INTERSPEECH 2021, 2021, : 2566 - 2570
  • [28] Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel
    Wang, Zhihao
    Yu, Yanwei
    Wang, Yibo
    Long, Haixu
    Wang, Fazheng
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 21 - 35
  • [29] EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 913 - 919
  • [30] End-to-end DNN based text-independent speaker recognition for long and short utterances
    Rohdin, Johan
    Silnova, Anna
    Diez, Mireia
    Plchot, Oldrich
    Matejka, Pavel
    Burget, Lukas
    Glembek, Ondrej
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 22 - 35