RTNet: An End-to-End Method for Handwritten Text Image Translation

被引：4

作者：

Su, Tonghua ^{[1
]}

Liu, Shuchen ^{[1
]}

Zhou, Shengjie ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China

来源：

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;

D O I：

10.1007/978-3-030-86331-9_7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.

引用

页码：99 / 113

页数：15

共 50 条

[1] Modal Contrastive Learning Based End-to-End Text Image Machine Translation
Ma, Cong
Han, Xu
Wu, Linghui
Zhang, Yaping
Zhao, Yang
Zhou, Yu
Zong, Chengqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2153 - 2165
[2] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
Bahar, Parnia
Bieschke, Tobias
Ney, Hermann
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799
[3] End-to-End Speech-to-Text Translation: A Survey
Sethiya, Nivedita
Maurya, Chandresh Kumar
COMPUTER SPEECH AND LANGUAGE, 2025, 90
[4] End-to-End Chinese Image Text Recognition with Attention Model
Sheng, Fenfen
Zhai, Chuanlei
Chen, Zhineng
Xu, Bo
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 180 - 189
[5] End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition
Qu, Xiwen
Wu, Zhihong
Huang, Jun
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62541 - 62558
[6] Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval
Zhang, Feifei
Xu, Mingliang
Xu, Changsheng
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
[7] Unconstrained end-to-end text reading with feature rectification
Du, Chen
Wang, Yanna
Wang, Chunheng
Xiao, Baihua
Shi, Cunzhao
PATTERN RECOGNITION LETTERS, 2021, 149 : 1 - 8
[8] END-TO-END CHINESE TEXT RECOGNITION
Hu, Jie
Guo, Tszhang
Cao, Ji
Zhang, Changshui
2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 1407 - 1411
[9] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
Makhmudov, Fazliddin
Mukhiddinov, Mukhriddin
Abdusalomov, Akmalbek
Avazov, Kuldoshbay
Khamdamov, Utkir
Cho, Young Im
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
[10] FREE: A Fast and Robust End-to-End Video Text Spotter
Cheng, Zhanzhan
Lu, Jing
Zou, Baorui
Qiao, Liang
Xu, Yunlu
Pu, Shiliang
Niu, Yi
Wu, Fei
Zhou, Shuigeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 822 - 837

← 1 2 3 4 5 →