Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

被引：4

作者：

Ma, Cong ^{[1
,2
]}

Zhang, Yaping ^{[1
,2
]}

Tu, Mei ^{[4
]}

Han, Xu ^{[1
,2
]}

Wu, Linghui ^{[1
,2
]}

Zhao, Yang ^{[1
,2
]}

Zhou, Yu ^{[2
,3
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit NLPR, 95 Zhongguan East Rd, Beijing 100190, Peoples R China

[3] Zhongke Fanyu Technol Co Ltd, Fanyu AI Lab, Beijing 100190, Peoples R China

[4] Samsung Res China Beijing SRC B, Beijing, Peoples R China

来源：

2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2022年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION; SEQUENCE;

D O I：

10.1109/ICPR56361.2022.9956695

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

End-to-end text image translation (TIT), which aims at translating the source language embedded in images to the target language, has attracted intensive attention in recent research. However, data sparsity limits the performance of end-to-end text image translation. Multi-task learning is a non-trivial way to alleviate this problem via exploring knowledge from complementary related tasks. In this paper, we propose a novel text translation enhanced text image translation, which trains the end-to-end model with text translation as an auxiliary task. By sharing model parameters and multi-task training, our model is able to take full advantage of easily-available large-scale text parallel corpus. Extensive experimental results show our proposed method outperforms existing end-to-end methods, and the joint multi-task learning with both text translation and recognition tasks achieves better results, proving translation and recognition auxiliary tasks are complementary. (1)

引用

页码：1664 / 1670

页数：7

共 50 条

[1] Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
Zhang, Yuhao
Xu, Chen
Hu, Bojie
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13984 - 13992
[2] RTNet: An End-to-End Method for Handwritten Text Image Translation
Su, Tonghua
Liu, Shuchen
Zhou, Shengjie
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 99 - 113
[3] Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task
Tang, Yun
Pino, Juan
Li, Xian
Wang, Changhan
Genzel, Dmitriy
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4252 - 4261
[4] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
Ren, Yi
Liu, Jinglin
Tan, Xu
Zhang, Chen
Qin, Tao
Zhao, Zhou
Liu, Tie-Yan
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796
[5] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
Bahar, Parnia
Bieschke, Tobias
Ney, Hermann
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799
[6] End-to-End Speech-to-Text Translation: A Survey
Sethiya, Nivedita
Maurya, Chandresh Kumar
COMPUTER SPEECH AND LANGUAGE, 2025, 90
[7] Modal Contrastive Learning Based End-to-End Text Image Machine Translation
Ma, Cong
Han, Xu
Wu, Linghui
Zhang, Yaping
Zhao, Yang
Zhou, Yu
Zong, Chengqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2153 - 2165
[8] Revisiting End-to-End Speech-to-Text Translation From Scratch
Zhang, Biao
Haddow, Barry
Sennrich, Rico
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
Ma, Xutai
Pino, Juan
Koehn, Philipp
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
[10] SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction
Chen, Junkun
Ma, Mingbo
Zheng, Renjie
Huang, Liang
INTERSPEECH 2021, 2021, : 2232 - 2236

← 1 2 3 4 5 →