Source-free Temporal Attentive Domain Adaptation for Video Action Recognition

被引:5
作者
Chen, Peipeng [1 ]
Ma, Andy J. [1 ,2 ,3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Minist Educ, Guangdong Prov Key Lab Informat Secur Technol, Beijing, Peoples R China
[3] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年
基金
中国国家自然科学基金;
关键词
source-free domain adaptation; action recognition; temporal attentive aggregation;
D O I
10.1145/3512527.3531392
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapidly increasing video data, many video analysis techniques have been developed and achieved success in recent years. To mitigate the distribution bias of video data across domains, unsupervised video domain adaptation (UVDA) has been proposed and become an active research topic. Nevertheless, existing UVDA methods need to access source domain data during training, which may result in problems of privacy policy violation and transfer inefficiency. To address this issue, we propose a novel source-free temporal attentive domain adaptation (SFTADA) method for video action recognition under the more challenging UVDA setting, such that source domain data is not required for learning the target domain. In our method, an innovative Temporal Attentive aGgregation (TAG) module is designed to combine frame-level features with varying importance weights for video-level representation generation. Without source domain data and label information in the target domain and during testing, an MLP-based attention network is trained to approximate the attentive aggregation function based on class centroids. By minimizing frame-level and video-level loss functions, both the temporal and spatial domain shifts in cross-domain video data can be reduced. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our proposed method in solving the challenging source-free UVDA task.
引用
收藏
页码:489 / 497
页数:9
相关论文
共 47 条
[1]   Open Set Domain Adaptation [J].
Busto, Pau Panareda ;
Gall, Juergen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :754-763
[2]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[3]  
Carreira J., 2018, arXiv
[4]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[5]   Temporal Attentive Alignment for Large-Scale Video Domain Adaptation [J].
Chen, Min-Hung ;
Kira, Zsolt ;
AlRegib, Ghassan ;
Yoo, Jaekwon ;
Chen, Ruxin ;
Zheng, Jian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6330-6339
[6]  
Choi J., 2020, ECCV, V2357, P678, DOI [DOI 10.1007/978-3-030-58610-240, 10.1007/ 978- 3- 030-58610-2_40]
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[9]   Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos [J].
Fu, Yuqian ;
Fu, Yanwei ;
Jiang, Yu-Gang .
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, :101-109
[10]  
Ganin Y, 2016, J MACH LEARN RES, V17