Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引：9

作者：

Ye, Junjie ^{[1
,2
]}

Guo, Junjun ^{[1
,2
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China

[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;

D O I：

10.1007/s10489-022-03331-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.

引用

页码：14194 / 14203

页数：10

共 50 条

[21] Video-guided machine translation via dual-level back-translation
Chen, Shiyu
Zeng, Yawen
Cao, Da
Lu, Shaofei
KNOWLEDGE-BASED SYSTEMS, 2022, 245
[22] Multi-modal simultaneous machine translation fusion of image information
Huang, Yan
Wanga, Zhanyang
Zhang, TianYuan
Xu, Chun
Lianga, Hui
JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
[23] Multi-Modal Approaches for Post-Editing Machine Translation
Herbig, Nico
Pal, Santanu
van Genabith, Josef
Krueger, Antonio
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[24] Visual Agreement Regularized Training for Multi-Modal Machine Translation
Yang, Pengcheng
Chen, Boxing
Zhang, Pei
Sun, Xu
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9418 - 9425
[25] Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation
Guo, Junjun
Hou, Zhenyu
Xian, Yantuan
Yu, Zhengtao
PATTERN RECOGNITION, 2024, 149
[26] MBIAN: Multi-level bilateral interactive attention network for multi-modal
Sun, Kai
Zhang, Jiangshe
Wang, Jialin
Xu, Shuang
Zhang, Chunxia
Hu, Junying
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[27] Interactive natural language acquisition in a multi-modal recurrent neural architecture
Heinrich, Stefan
Wermter, Stefan
CONNECTION SCIENCE, 2018, 30 (01) : 99 - 133
[28] MMPE: A Multi-Modal Interface for Post-Editing Machine Translation
Herbig, Nico
Duewel, Tim
Pal, Santanu
Meladaki, Kalliopi
Monshizadeh, Mahsa
Krueger, Antonio
van Genabith, Josef
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1691 - 1702
[29] HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment
Peng, Ru
Zeng, Yawen
Zhao, Junbo
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 380 - 388
[30] Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
Abdulmumin, Idris
Dash, Satya Ranjan
Dawud, Musa Abdullahi
Parida, Shantipriya
Muhammad, Shamsuddeen Hassan
Ahmad, Ibrahim Sa'id
Panda, Subhadarshi
Bojar, Ondrej
Galadanci, Bashir Shehu
Bello, Shehu Bello
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6471 - 6479

← 1 2 3 4 5 →