Glancing Text and Vision Regularized Training to Enhance Machine Translation

被引：1

作者：

Cheng, Pei ^{[1
]}

Shi, Xiayang ^{[1
]}

Liu, Beibei ^{[2
]}

Li, Meng ^{[2
]}

机构：

[1] Zhengzhou Univ Light Ind, Zhengzhou, Peoples R China

[2] South China Univ Technol, Guangzhou, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII | 2023年 / 14261卷

关键词：

Multimodal; Consistency; Machine Translation;

D O I：

10.1007/978-3-031-44198-1_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bilingual parallel sentences, combined with visual annotations, created an innovative machine translation scenario within the encoder-decoder framework, known as multimodal machine translation. In generally, it was encoded as an additional visual representation to enhance the dependent-time context vector when generating the target translation word by word. However, this approach only simulated the consistency between the visual annotation and the source language and did not sufficiently consider the consistency among the source language, the target language, and the visual context. To address this problem, we proposed a novel method that adds visual features to both the encoder and decoder. In the encoder, we designed a cross-modal correlation mechanism to effectively integrate textual and visual information. In the decoder, we designed a multimodal graph to enhance the related information of vision and text. Experimental results showed that the proposed approach significantly improved translation performance compared to strong baselines for the English-German/French language pairs. The ablation study further confirmed the effectiveness of the proposed approach in improving translation quality.

引用

页码：255 / 267

页数：13

共 22 条

[1]

arxiv, 2017, arXiv

[2]

Banerjee S, 2005, P ACL WORKSH INTR EX, P65

[3]

Caglayan O, 2019, Arxiv, DOI [arXiv:1903.08678, DOI 10.48550/ARXIV.1903.08678]

[4]

Caglayan O, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P1317

[5]

Elliott D., 2017, P 8 INT JOINT C NAT, V1, P130

[6]

Elliott D., 2016, P ANN M ASS COMP LIN, P70, DOI [10.18653/v1/w16-3210, DOI 10.18653/V1/W16-3210]

[7]

Elliott D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2974

[8]

Huang Po-Yao, 2016, First Conf Mach Transl, V2, P639, DOI [DOI 10.18653/V1/W16-23, DOI 10.18653/V1/W16-2360]

[9]

Lee JS, 2018, Arxiv, DOI arXiv:1710.06922

[10] Dynamic Context-guided Capsule Network for Multimodal Machine Translation [J].

Lin, Huan ;

Meng, Fandong ;

Su, Jinsong ;

Yin, Yongjing ;

Yang, Zhengyuan ;

Ge, Yubin ;

Zhou, Jie ;

Luo, Jiebo .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1320-1329

← 1 2 3 →