Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency

被引:0
|
作者
Chao Wang
Si-Jia Cai
Bei-Xiang Shi
Zhi-Hong Chong
机构
[1] Southeast University,School of Computer Science and Engineering
[2] Southeast University,School of Architecture
关键词
multi-modal machine translation; visual topic semantics; data efficiency;
D O I
暂无
中图分类号
学科分类号
摘要
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology. One of the research directions is employing relations among multi-modal data to enhance performance. However, the reliance on manually annotated multi-modal datasets results in a high cost of data labeling. In this paper, the topic semantics of images is proposed to alleviate the above problem. First, topic-related images can be automatically collected from the Internet by search engines. Second, topic semantics is sufficient to encode the relations between multi-modal data such as texts and images. Specifically, we propose a visual topic semantic enhanced translation (VTSE) model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space, allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features. In the above process, topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration. The results show that our model outperforms competitive baselines by a large margin on the Multi30k and the Ambiguous COCO datasets. Our model can use external images to bring gains to translation, improving data efficiency.
引用
收藏
页码:1223 / 1236
页数:13
相关论文
共 50 条
  • [1] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Wang, Chao
    Cai, Si-Jia
    Shi, Bei-Xiang
    Chong, Zhi-Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (06) : 1223 - 1236
  • [2] Multi-modal neural machine translation with deep semantic interactions
    Su, Jinsong
    Chen, Jinchang
    Jiang, Hui
    Zhou, Chulun
    Lin, Huan
    Ge, Yubin
    Wu, Qingqiang
    Lai, Yongxuan
    INFORMATION SCIENCES, 2021, 554 : 47 - 60
  • [3] Visual Agreement Regularized Training for Multi-Modal Machine Translation
    Yang, Pengcheng
    Chen, Boxing
    Zhang, Pei
    Sun, Xu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9418 - 9425
  • [4] Unsupervised Multi-modal Neural Machine Translation
    Su, Yuanhang
    Fan, Kai
    Nguyen Bach
    Kuo, C-C Jay
    Huang, Fei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
  • [5] Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Abdulmumin, Idris
    Dash, Satya Ranjan
    Dawud, Musa Abdullahi
    Parida, Shantipriya
    Muhammad, Shamsuddeen Hassan
    Ahmad, Ibrahim Sa'id
    Panda, Subhadarshi
    Bojar, Ondrej
    Galadanci, Bashir Shehu
    Bello, Shehu Bello
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6471 - 6479
  • [6] Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
    Parida, Shantipriya
    Bojar, Ondrej
    Dash, Satya Ranjan
    COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1499 - 1505
  • [7] Enhanced Topic Modeling with Multi-modal Representation Learning
    Zhang, Duoyi
    Wang, Yue
    Abul Bashar, Md
    Nayak, Richi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 393 - 404
  • [8] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
  • [9] Multi-modal Few-shot Image Recognition with enhanced semantic and visual integration
    Dong, Chunru
    Wang, Lizhen
    Zhang, Feng
    Hua, Qiang
    IMAGE AND VISION COMPUTING, 2025, 157
  • [10] Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation
    Zhengzhou Railway Vocational and Technical College, Zhengzhou
    450000, China
    J. Comput., 2 (37-50): : 37 - 50