Self-Supervised Learning for Action Recognition by Video Denoising

被引:0
作者
Thi Thu Trang Phung [1 ]
Thi Hong Thu Ma [2 ]
Van Truong Nguyen [3 ]
Duc Quang Vu [4 ]
机构
[1] Thai Nguyen Univ, Thai Nguyen, Vietnam
[2] Tan Trao Univ, Tuyen Quang, Vietnam
[3] Thai Nguyen Univ Educ, Thai Nguyen, Vietnam
[4] Natl Cent Univ, Dept CSIE, Taoyuan, Taiwan
来源
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021) | 2021年
关键词
D O I
10.1109/RIVF51545.2021.9642129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A new approach, such as self-supervised learning of which labels can be automatically generated, is essential. Therefore, using self-supervised learning is a new approach to state-of-the-art methods. In this paper, we introduce a new self-supervised method namely video denoising. This method requires an autoencoder model to restore original videos. The second model is proposed, which is called the discriminator. It is used for the quality evaluation of output videos from the autoencoder. By reconstructing videos, the autoencoder is learned both spatial and temporal relations of video frames to process the downstream task easily. In the experiments, we have demonstrated that our model is well transferred to the action recognition task and outperforms state-of-the-art methods on the UCF-101 and HMDB-51 datasets.
引用
收藏
页码:76 / 81
页数:6
相关论文
共 26 条
[1]   Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition [J].
Ahsan, Unaiza ;
Madhok, Rishi ;
Essa, Irfan .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :179-189
[2]  
[Anonymous], 2008, A spatio-temporal descriptor based on 3Dgradients
[3]   Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning [J].
Buechler, Uta ;
Brattoli, Biagio ;
Ommer, Bjoern .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :797-814
[4]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[5]   Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning [J].
Gan, Chuang ;
Gong, Boqing ;
Liu, Kun ;
Su, Hao ;
Guibas, Leonidas J. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5589-5597
[6]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441
[7]   Self-Supervised Video Representation Learning With Odd-One-Out Networks [J].
Fernando, Basura ;
Bilen, Hakan ;
Gavves, Efstratios ;
Gould, Stephen .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5729-5738
[8]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[9]   Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [J].
Hara, Kensho ;
Kataoka, Hirokatsu ;
Satoh, Yutaka .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6546-6555
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778