Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引:9
|
作者
Ye, Junjie [1 ,2 ]
Guo, Junjun [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;
D O I
10.1007/s10489-022-03331-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.
引用
收藏
页码:14194 / 14203
页数:10
相关论文
共 50 条
  • [1] Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation
    Junjie Ye
    Junjun Guo
    Applied Intelligence, 2022, 52 : 14194 - 14203
  • [2] A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
    Yin, Yongjing
    Meng, Fandong
    Su, Jinsong
    Zhou, Chulun
    Yang, Zhengyuan
    Zhou, Jie
    Luo, Jiebo
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3025 - 3035
  • [3] Unsupervised Multi-modal Neural Machine Translation
    Su, Yuanhang
    Fan, Kai
    Nguyen Bach
    Kuo, C-C Jay
    Huang, Fei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
  • [4] Multi-modal neural machine translation with deep semantic interactions
    Su, Jinsong
    Chen, Jinchang
    Jiang, Hui
    Zhou, Chulun
    Lin, Huan
    Ge, Yubin
    Wu, Qingqiang
    Lai, Yongxuan
    INFORMATION SCIENCES, 2021, 554 : 47 - 60
  • [5] Multi-modal graph contrastive encoding for neural machine translation
    Yin, Yongjing
    Zeng, Jiali
    Su, Jinsong
    Zhou, Chulun
    Meng, Fandong
    Zhou, Jie
    Huang, Degen
    Luo, Jiebo
    ARTIFICIAL INTELLIGENCE, 2023, 323
  • [6] Adding visual attention into encoder-decoder model for multi-modal machine translation
    Xu, Chun
    Yu, Zhengqing
    Shi, Xiayang
    Chen, Fang
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [7] DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction
    Wang, Zehan
    Zhou, Sihong
    Huang, Yuyao
    Tian, Wei
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [8] Learning to decode to future success for multi-modal neural machine translation
    Huang, Yan
    Zhang, TianYuan
    Xu, Chun
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [9] Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
    Calixto, Iacer
    Liu, Qun
    Campbell, Nick
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1913 - 1924
  • [10] Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation
    Guo, Junjun
    Ye, Junjie
    Xiang, Yan
    Yu, Zhengtao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3015 - 3026