Contrastive Adversarial Training for Multi-Modal Machine Translation

被引：2

作者：

Huang, Xin ^{[1
]}

Zhang, Jiajun ^{[1
]}

Zong, Chengqing ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Chinese Acad Sci, Sch Artificial Intelligence, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,95 Zhongguancun East Rd, Beijing 100190, Peoples R China

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 06期

关键词：

Contrastive Learning; adversarial training; multi-modal machine translation;

D O I：

10.1145/3587267

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. Existing methods have tried many ways to fuse visual information into text representations. However, only a minority of sentences need extra visual information as complementary. Without guidance, models tend to learn text-only translation from the major well-aligned translation pairs. In this article, we propose a contrastive adversarial training approach to enhance visual participation in semantic representation learning. By contrasting multi-modal input with the adversarial samples, the model learns to identify the most informed sample that is coupled with a congruent image and several visual objects extracted from it. This approach can prevent the visual information from being ignored and further fuse cross-modal information. We examine our method in three multi-modal language pairs. Experimental results show that our model is capable of improving translation accuracy. Further analysis shows that our model is more sensitive to visual information.

引用

页数：18

共 53 条

[1]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]

[2]

Belinkov Yonatan, 2018, P 6 INT C LEARN REPR

[3]

Caglayan O., 2016, ARXIV160903976

[4]

Caglayan O, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4159

[5] Doubly-Attentive Decoder for Multi-modal Neural Machine Translation [J].

Calixto, Iacer ;

Liu, Qun ;

Campbell, Nick .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1913-1924

[6]

Calixto Iacer, 2017, P 2017 C EMPIRICAL M, P992, DOI [DOI 10.18653/V1/D17-1105, 10.18653/v1/D17-1105]

[7]

Cheng Y, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4324

[8]

Denkowski M., 2014, P 9 WORKSH STAT MACH, P376, DOI [DOI 10.3115/V1/W14-3348, 10.3115/v1/w14-3348]

[9]

Elliott D., 2016, P ACL 2016

[10]

Elliott D, 2015, Arxiv, DOI arXiv:1510.04709

← 1 2 3 4 5 6 →