RTNet: An End-to-End Method for Handwritten Text Image Translation

被引：4

作者：

Su, Tonghua ^{[1
]}

Liu, Shuchen ^{[1
]}

Zhou, Shengjie ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China

来源：

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;

D O I：

10.1007/978-3-030-86331-9_7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.

引用

页码：99 / 113

页数：15

共 50 条

[31] TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
Zhang, Peng
Xu, Yunlu
Cheng, Zhanzhan
Pu, Shiliang
Lu, Jing
Qiao, Liang
Niu, Yi
Wu, Fei
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1413 - 1422
[32] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
Alnefaie, Ahlam
Gupta, Deepak
Bhuyan, Monowar H.
Razzak, Imran
Gupta, Prashant
Prasad, Mukesh
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[33] END-TO-END SPEECH TRANSLATION WITH SELF-CONTAINED VOCABULARY MANIPULATION
Tu, Mei
Zhang, Fan
Liu, Wei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7929 - 7933
[34] BACK-TRANSLATION-STYLE DATA AUGMENTATION FOR END-TO-END ASR
Hayashi, Tomoki
Watanabe, Shinji
Zhang, Yu
Toda, Tomoki
Hori, Takaaki
Astudillo, Ramon
Takeda, Kazuya
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 426 - 433
[35] A deep learning network based end-to-end image composition
Zhu, Xiaoyu
Wang, Haodi
Zhang, Zhiyi
Wu, Xiuping
Guo, Junqi
Wu, Hao
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 101
[36] An End-to-End Depression Recognition Method Based on EEGNet
Liu, Bo
Chang, Hongli
Peng, Kang
Wang, Xuenan
FRONTIERS IN PSYCHIATRY, 2022, 13
[37] Multi-Scale Visual Semantics Aggregation with Self-Attention for End-to-End Image-Text Matching
Zheng, Zhuobin
Ben, Youcheng
Yuan, Chun
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 940 - 955
[38] End-to-end scene text recognition using tree-structured models
Shi, Cunzhao
Wang, Chunheng
Xiao, Baihua
Gao, Song
Hu, Jinlong
PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
[39] End-to-end hard constrained text generation via incrementally predicting segments
Nie, Jinran
Huang, Xuancheng
Liu, Yang
Kong, Cunliang
Liu, Xin
Yang, Liner
Yang, Erhong
KNOWLEDGE-BASED SYSTEMS, 2023, 278
[40] On the Training and Testing Data Preparation for End-to-End Text-to-Speech Application
Duc Chung Tran
Khan, M. K. A. Ahamed
Sridevi, S.
2020 11TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC), 2020, : 73 - 75

← 1 2 3 4 5 →