A Novel Multiple-View Adversarial Learning Network for Unsupervised Domain Adaptation Action Recognition

被引:22
作者
Gao, Zan [1 ,2 ]
Zhao, Yibo [3 ]
Zhang, Hua [3 ]
Chen, Da [2 ]
Liu, An-An [4 ]
Chen, Shengyong [3 ]
机构
[1] Tianjin Univ Technol, Tianjin 300384, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Artificial Intelligence Inst, Jinan 250014, Peoples R China
[3] Tianjin Univ Technol, Minist Educ, Key Lab Comp Vis & Syst, Tianjin 300384, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Multiple-view adversarial learning; robust spatiotemporal feature extraction; self-attention mechanism fusion; unsupervised domain adaptation action recognition; ALIGNMENT;
D O I
10.1109/TCYB.2021.3105637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Abstract-domain adaptation action recognition is a hot research topic in machine learning and some effective approaches have been proposed. However, samples in the target domain with label information are often required by these approaches. Moreover, domain-invariant discriminative feature learning, feature fusion, and classifier module learning have not been explored in an end-to-end framework. Thus, in this study, we propose a novel end-to-end multiple-view adversarial learning network (MAN) for unsupervised domain adaptation action recognition in which the fusion of RGB and optical-flow features, domain-invariant discrimination feature learning, and action recognition is conducted in a unified framework. Specifically, a robust spatiotemporal feature extraction network, including a spatial transform network and an adaptive intrachannel weight network, is proposed to improve the scale invariance and robustness of the method. Then, a self-attention mechanism fusion module is designed to adaptively fuse the RGB and optical-flow features. Moreover, a multiview adversarial learning loss is developed to obtain domain-invariant discriminative features. In addition, three benchmark datasets are constructed for unsupervised domain adaptation action recognition, for which all actions and samples are carefully collected from public action datasets, and their action categories are hierarchically augmented, which can guide how to extend existing action datasets. We conduct extensive experiments on four benchmark datasets, and the experimental results demonstrate that our proposed MAN can outperform several state-of-the-art unsupervised domain adaptation action recognition approaches. When the SDAI Action 11-6 and SDAI Action II-11 datasets are used, MAN can achieve 3.7% (H -> U) and 6.1% (H -> U) improvements over the temporal attentive adversarial adaptation network (published in ICCV 2019) module, respectively. As an added contribution, the SDAI Action 11-6, SDAI Action II-11, and SDAI Action 11-16 datasets will be released to facilitate future research on domain adaptation action recognition.
引用
收藏
页码:13197 / 13211
页数:15
相关论文
共 62 条
[1]  
[Anonymous], 2012, CoRR
[2]   Wild patterns: Ten years after the rise of adversarial machine learning [J].
Biggio, Battista ;
Roli, Fabio .
PATTERN RECOGNITION, 2018, 84 :317-331
[3]   Open Set Domain Adaptation for Image and Action Recognition [J].
Busto, Pau Panareda ;
Iqbal, Ahsan ;
Gall, Juergen .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) :413-429
[4]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[5]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[8]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[9]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[10]  
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180