Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引：9

作者：

Ye, Junjie ^{[1
,2
]}

Guo, Junjun ^{[1
,2
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China

[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;

D O I：

10.1007/s10489-022-03331-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.

引用

页码：14194 / 14203

页数：10

共 50 条

[1] Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation
Junjie Ye
Junjun Guo
Applied Intelligence, 2022, 52 : 14194 - 14203
[2] A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yin, Yongjing
Meng, Fandong
Su, Jinsong
Zhou, Chulun
Yang, Zhengyuan
Zhou, Jie
Luo, Jiebo
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3025 - 3035
[3] Unsupervised Multi-modal Neural Machine Translation
Su, Yuanhang
Fan, Kai
Nguyen Bach
Kuo, C-C Jay
Huang, Fei
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
[4] Multi-modal neural machine translation with deep semantic interactions
Su, Jinsong
Chen, Jinchang
Jiang, Hui
Zhou, Chulun
Lin, Huan
Ge, Yubin
Wu, Qingqiang
Lai, Yongxuan
INFORMATION SCIENCES, 2021, 554 : 47 - 60
[5] Multi-modal graph contrastive encoding for neural machine translation
Yin, Yongjing
Zeng, Jiali
Su, Jinsong
Zhou, Chulun
Meng, Fandong
Zhou, Jie
Huang, Degen
Luo, Jiebo
ARTIFICIAL INTELLIGENCE, 2023, 323
[6] Adding visual attention into encoder-decoder model for multi-modal machine translation
Xu, Chun
Yu, Zhengqing
Shi, Xiayang
Chen, Fang
JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
[7] DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction
Wang, Zehan
Zhou, Sihong
Huang, Yuyao
Tian, Wei
2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
[8] Learning to decode to future success for multi-modal neural machine translation
Huang, Yan
Zhang, TianYuan
Xu, Chun
JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
[9] Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Calixto, Iacer
Liu, Qun
Campbell, Nick
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1913 - 1924
[10] Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation
Guo, Junjun
Ye, Junjie
Xiang, Yan
Yu, Zhengtao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3015 - 3026

← 1 2 3 4 5 →