Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency

被引:0
|
作者
Chao Wang
Si-Jia Cai
Bei-Xiang Shi
Zhi-Hong Chong
机构
[1] Southeast University,School of Computer Science and Engineering
[2] Southeast University,School of Architecture
关键词
multi-modal machine translation; visual topic semantics; data efficiency;
D O I
暂无
中图分类号
学科分类号
摘要
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology. One of the research directions is employing relations among multi-modal data to enhance performance. However, the reliance on manually annotated multi-modal datasets results in a high cost of data labeling. In this paper, the topic semantics of images is proposed to alleviate the above problem. First, topic-related images can be automatically collected from the Internet by search engines. Second, topic semantics is sufficient to encode the relations between multi-modal data such as texts and images. Specifically, we propose a visual topic semantic enhanced translation (VTSE) model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space, allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features. In the above process, topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration. The results show that our model outperforms competitive baselines by a large margin on the Multi30k and the Ambiguous COCO datasets. Our model can use external images to bring gains to translation, improving data efficiency.
引用
收藏
页码:1223 / 1236
页数:13
相关论文
共 50 条
  • [31] Visual Relation Extraction via Multi-modal Translation Embedding Based Model
    Li, Zhichao
    Han, Yuping
    Xu, Yajing
    Gao, Sheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT I, 2018, 10937 : 538 - 548
  • [32] Noise-Robust Semi-supervised Multi-modal Machine Translation
    Li, Lin
    Hu, Kaixi
    Tayir, Turghun
    Liu, Jianquan
    Lee, Kong Aik
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 155 - 168
  • [33] Probing Multi-modal Machine Translation with Pre-trained Language Model
    Kong, Yawei
    Fan, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3689 - 3699
  • [34] An error analysis for image-based multi-modal neural machine translation
    Calixto, Iacer
    Liu, Qun
    MACHINE TRANSLATION, 2019, 33 (1-2) : 155 - 177
  • [35] Mode Normalization Enhanced Recurrent Model for Multi-Modal Semantic Trajectory Prediction
    Zhu, Shaojie
    Zhang, Lei
    Liu, Bailong
    Cui, Shumin
    Shao, Changxing
    Li, Yun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (01): : 174 - 176
  • [36] Collection of Visual Data in Climbing Experiments for Addressing the Role of Multi-modal Exploration in Motor Learning Efficiency
    Schmidt, Adam
    Orth, Dominic
    Seifert, Ludovic
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2016, 2016, 10016 : 674 - 684
  • [37] Multi-modal Multi-label Semantic Indexing of Images using Unlabeled Data
    Li, Wei
    Sun, Maosong
    ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 204 - 209
  • [38] Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling
    Guo, Junjun
    Su, Rui
    Ye, Junjie
    NEURAL NETWORKS, 2024, 178
  • [39] Visual Prompt Multi-Modal Tracking
    Zhu, Jiawen
    Lai, Simiao
    Chen, Xin
    Wang, Dong
    Lu, Huchuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9516 - 9526
  • [40] VISUAL AS MULTI-MODAL ARGUMENTATION IN LAW
    Novak, Marko
    BRATISLAVA LAW REVIEW, 2021, 5 (01): : 91 - 110