RTNet: An End-to-End Method for Handwritten Text Image Translation

被引:4
|
作者
Su, Tonghua [1 ]
Liu, Shuchen [1 ]
Zhou, Shengjie [1 ]
机构
[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
基金
中国国家自然科学基金;
关键词
Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;
D O I
10.1007/978-3-030-86331-9_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.
引用
收藏
页码:99 / 113
页数:15
相关论文
共 50 条
  • [41] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
    Xue, Jian
    Wang, Peidong
    Li, Jinyu
    Post, Matt
    Gaur, Yashesh
    INTERSPEECH 2022, 2022, : 3263 - 3267
  • [42] Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework
    Geng, Tianyu
    IEEE ACCESS, 2024, 12 : 40582 - 40596
  • [43] End-to-End Facial Image Compression with Integrated Semantic Distortion Metric
    He, Tianyu
    Chen, Zhibo
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [44] Steganalysis Using Unsupervised End-to-end CNN Fused with Residual Image
    Wu, Yao
    Yi, Junkai
    Li, Hui
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [45] An End-to-End Recurrent Neural Network for Radial MR Image Reconstruction
    Oh, Changheun
    Chung, Jun-Young
    Han, Yeji
    SENSORS, 2022, 22 (19)
  • [46] Attentional Feature Fusion for End-to-End Blind Image Quality Assessment
    Zhou, Mingliang
    Lang, Shujun
    Zhang, Taiping
    Liao, Xingran
    Shang, Zhaowei
    Xiang, Tao
    Fang, Bin
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 144 - 152
  • [47] END-TO-END BLIND IMAGE QUALITY ASSESSMENT WITH CASCADED DEEP FEATURES
    Wu, Jinjian
    Ma, Jupo
    Liang, Fuhu
    Dong, Weisheng
    Shi, Guangming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1858 - 1863
  • [48] End-to-End Learning-Based Image Compression With a Decoupled Framework
    Zhang, Zhaobin
    Esenlik, Semih
    Wu, Yaojun
    Wang, Meng
    Zhang, Kai
    Zhang, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3067 - 3081
  • [49] End-to-End Information Extraction in Handwritten Documents: Understanding Paris Marriage Records from 1880 to 1940
    Constum, Thomas
    Preel, Lucas
    Larcher, Theo
    Paquet, Thierry
    Tranouez, Pierrick
    Bree, Sandra
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT III, 2024, 14806 : 195 - 214
  • [50] IoT ETEI: End-to-end IoT device identification method
    Yin, Feihong
    Yang, Li
    Wang, Yuchen
    Dai, Jiahao
    2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,