Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation

被引:9
|
作者
Ye, Junjie [1 ,2 ]
Guo, Junjun [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
[2] Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal neural machine translation; Dual-level interactive multimodal-mixup encoder; Transformer; Feature fusion;
D O I
10.1007/s10489-022-03331-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal neural machine translation (MNMT), which mainly focuses on the use of image information to guide text translation. Recent MNMT approaches have been shown that incorporating visual features into textual translation framework is helpful to improve machine translation. However, visual features always contain textual unrelated information, but the noisy visual feature fusion problem is rarely considered for traditional MNMT methods. How to extract the useful visual features to enhance textual machine translation is the key point need to be considered for MNMT. In this paper, we propose a novel Dual-level Interactive Multimodal-Mixup Encoder (DLMulMix) based on multimodal-mixup for MNMT, which can extract the useful visual features to enhance textual-level machine translation. We first employ the Textual-visual Gating to extract text related visual features, which we believe that regional features are crucial for MNMT. Then visual grid features are employed in order to establish the image context of the effective regional features. Moreover, an effective visual-textual multimodal-mixup is adopted to align textual features and visual features into multi-modal common space to improve textual-level machine translation. We evaluate our proposed method on the Multi30K dataset. The experimental results show that the proposed approach outperforms the previous efforts for both EN-DE and EN-FR tasks regarding BLEU and METEOR scores.
引用
收藏
页码:14194 / 14203
页数:10
相关论文
共 50 条
  • [21] Video-guided machine translation via dual-level back-translation
    Chen, Shiyu
    Zeng, Yawen
    Cao, Da
    Lu, Shaofei
    KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [22] Multi-modal simultaneous machine translation fusion of image information
    Huang, Yan
    Wanga, Zhanyang
    Zhang, TianYuan
    Xu, Chun
    Lianga, Hui
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [23] Multi-Modal Approaches for Post-Editing Machine Translation
    Herbig, Nico
    Pal, Santanu
    van Genabith, Josef
    Krueger, Antonio
    CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [24] Visual Agreement Regularized Training for Multi-Modal Machine Translation
    Yang, Pengcheng
    Chen, Boxing
    Zhang, Pei
    Sun, Xu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9418 - 9425
  • [25] Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation
    Guo, Junjun
    Hou, Zhenyu
    Xian, Yantuan
    Yu, Zhengtao
    PATTERN RECOGNITION, 2024, 149
  • [26] MBIAN: Multi-level bilateral interactive attention network for multi-modal
    Sun, Kai
    Zhang, Jiangshe
    Wang, Jialin
    Xu, Shuang
    Zhang, Chunxia
    Hu, Junying
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [27] Interactive natural language acquisition in a multi-modal recurrent neural architecture
    Heinrich, Stefan
    Wermter, Stefan
    CONNECTION SCIENCE, 2018, 30 (01) : 99 - 133
  • [28] MMPE: A Multi-Modal Interface for Post-Editing Machine Translation
    Herbig, Nico
    Duewel, Tim
    Pal, Santanu
    Meladaki, Kalliopi
    Monshizadeh, Mahsa
    Krueger, Antonio
    van Genabith, Josef
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1691 - 1702
  • [29] HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment
    Peng, Ru
    Zeng, Yawen
    Zhao, Junbo
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 380 - 388
  • [30] Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Abdulmumin, Idris
    Dash, Satya Ranjan
    Dawud, Musa Abdullahi
    Parida, Shantipriya
    Muhammad, Shamsuddeen Hassan
    Ahmad, Ibrahim Sa'id
    Panda, Subhadarshi
    Bojar, Ondrej
    Galadanci, Bashir Shehu
    Bello, Shehu Bello
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6471 - 6479