RTNet: An End-to-End Method for Handwritten Text Image Translation

被引:4
|
作者
Su, Tonghua [1 ]
Liu, Shuchen [1 ]
Zhou, Shengjie [1 ]
机构
[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
基金
中国国家自然科学基金;
关键词
Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;
D O I
10.1007/978-3-030-86331-9_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.
引用
收藏
页码:99 / 113
页数:15
相关论文
共 50 条
  • [31] TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
    Zhang, Peng
    Xu, Yunlu
    Cheng, Zhanzhan
    Pu, Shiliang
    Lu, Jing
    Qiao, Liang
    Niu, Yi
    Wu, Fei
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1413 - 1422
  • [32] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
    Alnefaie, Ahlam
    Gupta, Deepak
    Bhuyan, Monowar H.
    Razzak, Imran
    Gupta, Prashant
    Prasad, Mukesh
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [33] END-TO-END SPEECH TRANSLATION WITH SELF-CONTAINED VOCABULARY MANIPULATION
    Tu, Mei
    Zhang, Fan
    Liu, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7929 - 7933
  • [34] BACK-TRANSLATION-STYLE DATA AUGMENTATION FOR END-TO-END ASR
    Hayashi, Tomoki
    Watanabe, Shinji
    Zhang, Yu
    Toda, Tomoki
    Hori, Takaaki
    Astudillo, Ramon
    Takeda, Kazuya
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 426 - 433
  • [35] A deep learning network based end-to-end image composition
    Zhu, Xiaoyu
    Wang, Haodi
    Zhang, Zhiyi
    Wu, Xiuping
    Guo, Junqi
    Wu, Hao
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 101
  • [36] An End-to-End Depression Recognition Method Based on EEGNet
    Liu, Bo
    Chang, Hongli
    Peng, Kang
    Wang, Xuenan
    FRONTIERS IN PSYCHIATRY, 2022, 13
  • [37] Multi-Scale Visual Semantics Aggregation with Self-Attention for End-to-End Image-Text Matching
    Zheng, Zhuobin
    Ben, Youcheng
    Yuan, Chun
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 940 - 955
  • [38] End-to-end scene text recognition using tree-structured models
    Shi, Cunzhao
    Wang, Chunheng
    Xiao, Baihua
    Gao, Song
    Hu, Jinlong
    PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
  • [39] End-to-end hard constrained text generation via incrementally predicting segments
    Nie, Jinran
    Huang, Xuancheng
    Liu, Yang
    Kong, Cunliang
    Liu, Xin
    Yang, Liner
    Yang, Erhong
    KNOWLEDGE-BASED SYSTEMS, 2023, 278
  • [40] On the Training and Testing Data Preparation for End-to-End Text-to-Speech Application
    Duc Chung Tran
    Khan, M. K. A. Ahamed
    Sridevi, S.
    2020 11TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC), 2020, : 73 - 75