RTNet: An End-to-End Method for Handwritten Text Image Translation

被引：4

作者：

Su, Tonghua ^{[1
]}

Liu, Shuchen ^{[1
]}

Zhou, Shengjie ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China

来源：

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;

D O I：

10.1007/978-3-030-86331-9_7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.

引用

页码：99 / 113

页数：15

共 50 条

[41] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Xue, Jian
Wang, Peidong
Li, Jinyu
Post, Matt
Gaur, Yashesh
INTERSPEECH 2022, 2022, : 3263 - 3267
[42] Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework
Geng, Tianyu
IEEE ACCESS, 2024, 12 : 40582 - 40596
[43] End-to-End Facial Image Compression with Integrated Semantic Distortion Metric
He, Tianyu
Chen, Zhibo
2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
[44] Steganalysis Using Unsupervised End-to-end CNN Fused with Residual Image
Wu, Yao
Yi, Junkai
Li, Hui
2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
[45] An End-to-End Recurrent Neural Network for Radial MR Image Reconstruction
Oh, Changheun
Chung, Jun-Young
Han, Yeji
SENSORS, 2022, 22 (19)
[46] Attentional Feature Fusion for End-to-End Blind Image Quality Assessment
Zhou, Mingliang
Lang, Shujun
Zhang, Taiping
Liao, Xingran
Shang, Zhaowei
Xiang, Tao
Fang, Bin
IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 144 - 152
[47] END-TO-END BLIND IMAGE QUALITY ASSESSMENT WITH CASCADED DEEP FEATURES
Wu, Jinjian
Ma, Jupo
Liang, Fuhu
Dong, Weisheng
Shi, Guangming
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1858 - 1863
[48] End-to-End Learning-Based Image Compression With a Decoupled Framework
Zhang, Zhaobin
Esenlik, Semih
Wu, Yaojun
Wang, Meng
Zhang, Kai
Zhang, Li
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3067 - 3081
[49] End-to-End Information Extraction in Handwritten Documents: Understanding Paris Marriage Records from 1880 to 1940
Constum, Thomas
Preel, Lucas
Larcher, Theo
Paquet, Thierry
Tranouez, Pierrick
Bree, Sandra
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT III, 2024, 14806 : 195 - 214
[50] IoT ETEI: End-to-end IoT device identification method
Yin, Feihong
Yang, Li
Wang, Yuchen
Dai, Jiahao
2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,

← 1 2 3 4 5 →