Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引：9

作者：

Ye, Junjie ^{[1
,2
]}

Guo, Junjun ^{[1
,2
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China

[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;

D O I：

10.1007/s10489-022-03331-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.

引用

页码：14194 / 14203

页数：10

共 50 条

[41] Interactive Multi-System Machine Translation with Neural Language Models
Rikters, Matiss
DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 269 - 280
[42] Multi-Modal Neural Conditional Ordinal Random Fields for Agreement Level Estimation
Rakicevic, Nemanja
Rudovic, Ognjen
Petridis, Stavros
Pantic, Maja
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2228 - 2233
[43] Discovering Multimodal Hierarchical Structures with Graph Neural Networks for Multi-modal and Multi-hop Question Answering
Zhang, Qing
Lv, Haocheng
Liu, Jie
Chen, Zhiyun
Duan, Jianyong
Xv, Mingying
Wang, Hao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 383 - 394
[44] Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation
Herbig, Nico
Pal, Santanu
Vela, Mihaela
Krueger, Antonio
van Genabith, Josef
MACHINE TRANSLATION, 2019, 33 (1-2) : 91 - 115
[45] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
Song, Xin
Chen, Zhikui
Zhong, Fangming
Gao, Jing
Zhang, Jianning
Li, Peng
SENSORS, 2024, 24 (22)
[46] Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation
Li, Bei
Liu, Hui
Wang, Ziyang
Jiang, Yufan
Xiao, Tong
Zhu, Jingbo
Liu, Tongran
Li, Changliang
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3512 - 3518
[47] Multi-modal transcriptomics: integrating machine learning and convolutional neural networks to identify immune biomarkers in atherosclerosis
Chen, Haiqing
Lai, Haotian
Chi, Hao
Fan, Wei
Huang, Jinbang
Zhang, Shengke
Jiang, Chenglu
Jiang, Lai
Hu, Qingwen
Yan, Xiuben
Chen, Yemeng
Zhang, Jieying
Yang, Guanhu
Liao, Bin
Wan, Juyi
FRONTIERS IN CARDIOVASCULAR MEDICINE, 2024, 11
[48] Single-shot hyperspectral imaging based on dual attention neural network with multi-modal learning
He, Tianyue
Zhang, Qican
Zhou, Mingwei
Kou, Tingdong
Shen, Junfei
OPTICS EXPRESS, 2022, 30 (06) : 9790 - 9813
[49] An Adaptive Dual-channel Multi-modal graph neural network for few-shot learning
Yang, Jieyi
Dong, Yihong
Li, Guoqing
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[50] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
Liang, Yanhua
Qin, Guihe
Sun, Minghui
Qin, Jun
Yan, Jie
Zhang, Zhonghan
NEUROCOMPUTING, 2022, 490 : 132 - 145

← 1 2 3 4 5 →