Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency

被引:0
|
作者
Chao Wang
Si-Jia Cai
Bei-Xiang Shi
Zhi-Hong Chong
机构
[1] Southeast University,School of Computer Science and Engineering
[2] Southeast University,School of Architecture
关键词
multi-modal machine translation; visual topic semantics; data efficiency;
D O I
暂无
中图分类号
学科分类号
摘要
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology. One of the research directions is employing relations among multi-modal data to enhance performance. However, the reliance on manually annotated multi-modal datasets results in a high cost of data labeling. In this paper, the topic semantics of images is proposed to alleviate the above problem. First, topic-related images can be automatically collected from the Internet by search engines. Second, topic semantics is sufficient to encode the relations between multi-modal data such as texts and images. Specifically, we propose a visual topic semantic enhanced translation (VTSE) model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space, allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features. In the above process, topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration. The results show that our model outperforms competitive baselines by a large margin on the Multi30k and the Ambiguous COCO datasets. Our model can use external images to bring gains to translation, improving data efficiency.
引用
收藏
页码:1223 / 1236
页数:13
相关论文
共 50 条
  • [41] Multi-modal measurement of the visual cortex
    Amano, Kaoru
    Takemura, Hiromasa
    I-PERCEPTION, 2014, 5 (04): : 408 - 408
  • [42] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [43] EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization
    Gao, Yuexiu
    Zhang, Hongyu
    Lyu, Chen
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (05)
  • [44] A Multi-modal System for Video Semantic Understanding
    Lv, Zhengwei
    Lei, Tao
    Liang, Xiao
    Shi, Zhizhong
    Liu, Duoxing
    CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 34 - 43
  • [45] EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization
    Yuexiu Gao
    Hongyu Zhang
    Chen Lyu
    Empirical Software Engineering, 2023, 28
  • [46] SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION
    Tran, Thu Hoai
    Choi, Seungjin
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [47] Multi-Modal Mention Topic Model for mentionee recommendation
    Wang, Kai
    Meng, Weiyi
    Li, Shijun
    Yang, Sha
    NEUROCOMPUTING, 2019, 325 : 190 - 199
  • [48] Generalized concept overlay for semantic multi-modal analysis of audio-visual content
    Mezaris, Vasileios
    Gidaros, Spyros
    Kompatsiaris, Ioannis
    PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, 2009, : 27 - 32
  • [49] MMVSL: A multi-modal visual semantic learning method for pig pose and action recognition
    Guan, Zhibin
    Chai, Xiujuan
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 229
  • [50] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
    Na, Byeonghu
    Kim, Yoonsik
    Park, Sungrae
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 446 - 463