Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval

被引:7
作者
Dong, Jianfeng [1 ,2 ]
Zhang, Minsong [1 ]
Zhang, Zheng [1 ]
Chen, Xianke [1 ]
Liu, Daizong [3 ]
Qu, Xiaoye [4 ]
Wang, Xun [1 ,2 ]
Liu, Baolong [1 ,2 ]
机构
[1] Zhejiang Gongshang Univ, Hangzhou, Peoples R China
[2] Zhejiang Key Lab E Commerce, Hangzhou, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with short durations. However, in practice, videos are generally untrimmed containing much background content. In this work, we investigate the more practical but challenging Partially Relevant Video Retrieval (PRVR) task, which aims to retrieve partially relevant untrimmed videos with the query input. Particularly, we propose to address PRVR from a new perspective, i.e., distilling the generalization knowledge from the large-scale vision-language pre-trained model and transferring it to a task-specific PRVR network. To be specific, we introduce a Dual Learning framework with Dynamic Knowledge Distillation (DL-DKD), which exploits the knowledge of a large vision-language model as the teacher to guide a student model. During the knowledge distillation, an inheritance student branch is devised to absorb the knowledge from the teacher model. Considering that the large model may be of mediocre performance due to the domain gaps, we further develop an exploration student branch to take the benefits of task-specific information. In addition, a dynamical knowledge distillation strategy is further devised to adjust the effect of each student branch learning during the training. Experiment results demonstrate that our proposed model achieves state-of-the-art performance on ActivityNet and TVR datasets for PRVR.
引用
收藏
页码:11268 / 11278
页数:11
相关论文
共 65 条
[51]   Contextual Similarity Distillation for Asymmetric Image Retrieval [J].
Wu, Hui ;
Wang, Min ;
Zhou, Wengang ;
Li, Houqiang ;
Tian, Qi .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9479-9488
[52]   HANet: Hierarchical Alignment Networks for Video-Text Retrieval [J].
Wu, Peng ;
He, Xiangteng ;
Tang, Mingqian ;
Lv, Yiliang ;
Liu, Jing .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :3518-3527
[53]  
Xiang Wangmeng, 2022, ARXIV220805318
[54]   Visual Relation Grounding in Videos [J].
Xiao, Junbin ;
Shang, Xindi ;
Yang, Xun ;
Tang, Sheng ;
Chua, Tat-Seng .
COMPUTER VISION - ECCV 2020, PT VI, 2020, 12351 :447-464
[55]  
Xu Mengde, 2021, arXiv preprint arXiv:2112.14757
[56]  
Yan Shuanglin, 2022, ARXIV221010276
[57]   Investigation of Spatial-Varying Frequencies Concerning Effects of Moving Mass on a Beam [J].
Yang, Judy P. ;
Su, Zhi-Yuan ;
Yau, J. D. ;
Yang, D. S. .
INTERNATIONAL JOURNAL OF STRUCTURAL STABILITY AND DYNAMICS, 2023, 23 (16N18)
[58]   Understanding residents' green purchasing behavior from a perspective of the ecological personality traits: the moderating role of gender [J].
Yang, Xianchuan ;
Zhang, Lei .
SOCIAL SCIENCE JOURNAL, 2024, 61 (03) :668-685
[59]   Video Moment Retrieval With Cross-Modal Neural Architecture Search [J].
Yang, Xun ;
Wang, Shanshan ;
Dong, Jian ;
Dong, Jianfeng ;
Wang, Meng ;
Chua, Tat-Seng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :1204-1216
[60]   Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval [J].
Yang, Xun ;
Dong, Jianfeng ;
Cao, Yixin ;
Wang, Xun ;
Wang, Meng ;
Chua, Tat-Seng .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :1339-1348