Vision talks: Visual relationship-enhanced transformer for video-guided machine translation

被引:1
|
作者
Chen, Shiyu [1 ]
Zeng, Yawen [1 ]
Cao, Da [1 ]
Lu, Shaofei [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Visual relationship; Transformer; Graph convolutional network;
D O I
10.1016/j.eswa.2022.118264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-guided machine translation is a promising task which aims to translate a source language description into a target language utilizing the video information as supplementary context. The majority of existing work utilizes the whole video as the auxiliary information to enhance the translation performance. However, visual information, as a heterogeneous modal with text, introduces noise instead. Toward this end, we propose a novel visual relationship-enhanced transformer by constructing a semantic-visual relational graph as a cross-modal bridge. Specifically, the visual information is regarded as the structured conceptual representation, which builds a bridge between two modalities. Thereafter, graph convolutional network is deployed to capture the relationship among visual semantics. In this way, a transformer with structured multi-modal fusion strategy is allowed to explore the correlations. Finally, the proposed framework is optimized under the scheme of Kullback-Leibler divergence with label smoothing. Extensive experiments demonstrate the rationality and effectiveness of our proposed method as compared to other state-of-the-art solutions.
引用
收藏
页数:11
相关论文
共 21 条
  • [1] Video-guided Machine Translation with Spatial Hierarchical Attention Network
    Gu, Weiqi
    Song, Haiyue
    Chu, Chenhui
    Kurohashi, Sadao
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 87 - 92
  • [2] Video-guided machine translation via dual-level back-translation
    Chen, Shiyu
    Zeng, Yawen
    Cao, Da
    Lu, Shaofei
    KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [3] SiamSampler: Video-Guided Sampling for Siamese Visual Tracking
    Li, Peixia
    Chen, Boyu
    Bai, Lei
    Qiao, Lei
    Li, Bo
    Ouyang, Wanli
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1752 - 1761
  • [4] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
    Leem, Saebom
    Seo, Hyunseok
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
  • [5] Deep learning-guided video compression for machine vision tasks
    Kim, Aro
    Woo, Seung-taek
    Park, Minho
    Kim, Dong-hwi
    Lim, Hanshin
    Jung, Soon-heung
    Kwak, Sangwoon
    Park, Sang-hyo
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2024, 2024 (01)
  • [6] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
    Zhang, Shuo
    Zhang, Dan
    Zou, Qi
    NEUROCOMPUTING, 2024, 606
  • [7] X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
    Liu, Huey-Ing
    Chen, Wei-Lin
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [8] Improving End-to-End Sign Language Translation With Adaptive Video Representation Enhanced Transformer
    Liu, Zidong
    Wu, Jiasong
    Shen, Zeyu
    Chen, Xin
    Wu, Qianyu
    Gui, Zhiguo
    Senhadji, Lotfi
    Shu, Huazhong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8327 - 8342
  • [9] Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer
    Hojo, Nobukatsu
    Mizuno, Saki
    Kobashikawa, Satoshi
    Masumura, Ryo
    Ihori, Mana
    Sato, Hiroshi
    Tanaka, Tomohiro
    INTERSPEECH 2023, 2023, : 2663 - 2667
  • [10] TOKENMOTION: MOTION-GUIDED VISION TRANSFORMER FOR VIDEO CAMOUFLAGED OBJECT DETECTION VIA LEARNABLE TOKEN SELECTION
    Yu, Zifan
    Tavakoli, Erfan Bank
    Chen, Meida
    You, Suya
    Rao, Raghuveer
    Agarwal, Sanjeev
    Ren, Fengbo
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2875 - 2879