Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引:9
|
作者
Ye, Junjie [1 ,2 ]
Guo, Junjun [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;
D O I
10.1007/s10489-022-03331-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.
引用
收藏
页码:14194 / 14203
页数:10
相关论文
共 50 条
  • [41] Interactive Multi-System Machine Translation with Neural Language Models
    Rikters, Matiss
    DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 269 - 280
  • [42] Multi-Modal Neural Conditional Ordinal Random Fields for Agreement Level Estimation
    Rakicevic, Nemanja
    Rudovic, Ognjen
    Petridis, Stavros
    Pantic, Maja
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2228 - 2233
  • [43] Discovering Multimodal Hierarchical Structures with Graph Neural Networks for Multi-modal and Multi-hop Question Answering
    Zhang, Qing
    Lv, Haocheng
    Liu, Jie
    Chen, Zhiyun
    Duan, Jianyong
    Xv, Mingying
    Wang, Hao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 383 - 394
  • [44] Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation
    Herbig, Nico
    Pal, Santanu
    Vela, Mihaela
    Krueger, Antonio
    van Genabith, Josef
    MACHINE TRANSLATION, 2019, 33 (1-2) : 91 - 115
  • [45] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
    Song, Xin
    Chen, Zhikui
    Zhong, Fangming
    Gao, Jing
    Zhang, Jianning
    Li, Peng
    SENSORS, 2024, 24 (22)
  • [46] Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation
    Li, Bei
    Liu, Hui
    Wang, Ziyang
    Jiang, Yufan
    Xiao, Tong
    Zhu, Jingbo
    Liu, Tongran
    Li, Changliang
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3512 - 3518
  • [47] Multi-modal transcriptomics: integrating machine learning and convolutional neural networks to identify immune biomarkers in atherosclerosis
    Chen, Haiqing
    Lai, Haotian
    Chi, Hao
    Fan, Wei
    Huang, Jinbang
    Zhang, Shengke
    Jiang, Chenglu
    Jiang, Lai
    Hu, Qingwen
    Yan, Xiuben
    Chen, Yemeng
    Zhang, Jieying
    Yang, Guanhu
    Liao, Bin
    Wan, Juyi
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2024, 11
  • [48] Single-shot hyperspectral imaging based on dual attention neural network with multi-modal learning
    He, Tianyue
    Zhang, Qican
    Zhou, Mingwei
    Kou, Tingdong
    Shen, Junfei
    OPTICS EXPRESS, 2022, 30 (06) : 9790 - 9813
  • [49] An Adaptive Dual-channel Multi-modal graph neural network for few-shot learning
    Yang, Jieyi
    Dong, Yihong
    Li, Guoqing
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [50] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145