Video-guided machine translation via dual-level back-translation

被引:8
作者
Chen, Shiyu [1 ]
Zeng, Yawen [1 ]
Cao, Da [1 ]
Lu, Shaofei [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multiple modalities; Video-guided machine transaltion; Back-translation; Shared transformer;
D O I
10.1016/j.knosys.2022.108598
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-guided machine translation aims to translate a source language description into a target language using the video information as additional spatio-temporal context. Existing methods focus on making full use of videos as auxiliary material, while ignoring the semantic consistency and reducibility between the source language and the target language. In addition, the visual concept is helpful for improving the alignment and translation of different languages but is rarely considered. Toward this end, we contribute a novel solution to thoroughly investigate the video-guided machine translation issue via dual-level back-translation. Specifically, we first exploit a sentence-level back-translation to obtain the coarse-grained semantics. Thereafter, a video concept-level back-translation module is proposed to explore the fine-grained semantic consistency and reducibility. Lastly, a multi-pattern joint learning approach is utilized to boost the translation performance. By experimenting on two real-world datasets, we demonstrate the effectiveness and rationality of our proposed solution. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
[1]  
[Anonymous], 2015, Microsoft COCO captions: Data collection and evaluation server
[2]  
[Anonymous], 2016, Multi-task sequence to sequence learning
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Banerjee Satanjeev, 2005, P ACL WORKSH INTR EX, P65
[5]  
Caglayan O, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P2350
[6]  
Caglayan O, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4159
[7]  
Calixto I, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P6392
[8]   Cross-modal recipe retrieval via parallel- and cross-attention networks learning [J].
Cao, Da ;
Chu, Jingjing ;
Zhu, Ningbo ;
Nie, Liqiang .
KNOWLEDGE-BASED SYSTEMS, 2020, 193
[9]  
Chen D., 2011, P 49 ANN M ASS COMP, P190
[10]  
Citamak Begum, 2020, MSVD TURKISH COMPREH