Modal Contrastive Learning Based End-to-End Text Image Machine Translation

被引：0

作者：

Ma, Cong ^{[1
,2
]}

Han, Xu ^{[1
,2
]}

Wu, Linghui ^{[1
,2
]}

Zhang, Yaping ^{[1
,2
]}

Zhao, Yang ^{[1
,2
]}

Zhou, Yu ^{[1
,2
]}

Zong, Chengqing ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Machine translation; Decoding; Semantics; Pipelines; Text recognition; Task analysis; Text image machine translation; contrastive learning; text image recognition; machine translation; RECOGNITION;

D O I：

10.1109/TASLP.2023.3324540

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Text image machine translation (TIMT) aims at directly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End-to-end Text Image Machine Translation (METIMT), which alleviates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.

引用

页码：2153 / 2165

页数：13

共 50 条

[1] RTNet: An End-to-End Method for Handwritten Text Image Translation
Su, Tonghua
Liu, Shuchen
Zhou, Shengjie
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 99 - 113
[2] Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task
Ma, Cong
Zhang, Yaping
Tu, Mei
Han, Xu
Wu, Linghui
Zhao, Yang
Zhou, Yu
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1664 - 1670
[3] End-to-End Speech-to-Text Translation: A Survey
Sethiya, Nivedita
Maurya, Chandresh Kumar
COMPUTER SPEECH AND LANGUAGE, 2025, 90
[4] End-to-End Network Intrusion Detection Based on Contrastive Learning
Li, Longlong
Lu, Yuliang
Yang, Guozheng
Yan, Xuehu
SENSORS, 2024, 24 (07)
[5] MINTZAI: End-to-end Deep Learning for Speech Translation
Etchegoyhen, Thierry
Arzelus, Haritz
Gete, Harritxu
Alvarez, Aitor
Hernaez, Inma
Navas, Eva
Gonzalez-Docasal, Ander
Osacar, Jaime
Benites, Edson
Ellakuria, Igor
Calonge, Eusebi
Martin, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
[6] End-to-end entity-aware neural machine translation
Xie, Shufang
Xia, Yingce
Wu, Lijun
Huang, Yiqing
Fan, Yang
Qin, Tao
MACHINE LEARNING, 2022, 111 (03) : 1181 - 1203
[7] End-to-end entity-aware neural machine translation
Shufang Xie
Yingce Xia
Lijun Wu
Yiqing Huang
Yang Fan
Tao Qin
Machine Learning, 2022, 111 : 1181 - 1203
[8] Contrastive Learning for improving End-to-end Speaker Verification
Tang, Yanxi
Wang, Jianzong
Qu, Xiaoyang
Xiao, Jing
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[9] FREE: A Fast and Robust End-to-End Video Text Spotter
Cheng, Zhanzhan
Lu, Jing
Zou, Baorui
Qiao, Liang
Xu, Yunlu
Pu, Shiliang
Niu, Yi
Wu, Fei
Zhou, Shuigeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 822 - 837
[10] Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning
Xu, Zhenlong
Zhou, Shuigeng
Bai, Fan
Cheng, Zhanzhan
Niu, Yi
Pu, Shiliang
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7058 - 7065

← 1 2 3 4 5 →