RTNet: An End-to-End Method for Handwritten Text Image Translation

被引:4
|
作者
Su, Tonghua [1 ]
Liu, Shuchen [1 ]
Zhou, Shengjie [1 ]
机构
[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
基金
中国国家自然科学基金;
关键词
Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;
D O I
10.1007/978-3-030-86331-9_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.
引用
收藏
页码:99 / 113
页数:15
相关论文
共 50 条
  • [1] Modal Contrastive Learning Based End-to-End Text Image Machine Translation
    Ma, Cong
    Han, Xu
    Wu, Linghui
    Zhang, Yaping
    Zhao, Yang
    Zhou, Yu
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2153 - 2165
  • [2] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
    Bahar, Parnia
    Bieschke, Tobias
    Ney, Hermann
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799
  • [3] End-to-End Speech-to-Text Translation: A Survey
    Sethiya, Nivedita
    Maurya, Chandresh Kumar
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [4] End-to-End Chinese Image Text Recognition with Attention Model
    Sheng, Fenfen
    Zhai, Chuanlei
    Chen, Zhineng
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 180 - 189
  • [5] End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition
    Qu, Xiwen
    Wu, Zhihong
    Huang, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62541 - 62558
  • [6] Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval
    Zhang, Feifei
    Xu, Mingliang
    Xu, Changsheng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [7] Unconstrained end-to-end text reading with feature rectification
    Du, Chen
    Wang, Yanna
    Wang, Chunheng
    Xiao, Baihua
    Shi, Cunzhao
    PATTERN RECOGNITION LETTERS, 2021, 149 : 1 - 8
  • [8] END-TO-END CHINESE TEXT RECOGNITION
    Hu, Jie
    Guo, Tszhang
    Cao, Ji
    Zhang, Changshui
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 1407 - 1411
  • [9] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
    Makhmudov, Fazliddin
    Mukhiddinov, Mukhriddin
    Abdusalomov, Akmalbek
    Avazov, Kuldoshbay
    Khamdamov, Utkir
    Cho, Young Im
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [10] FREE: A Fast and Robust End-to-End Video Text Spotter
    Cheng, Zhanzhan
    Lu, Jing
    Zou, Baorui
    Qiao, Liang
    Xu, Yunlu
    Pu, Shiliang
    Niu, Yi
    Wu, Fei
    Zhou, Shuigeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 822 - 837