Self-Supervised Learning for Action Recognition by Video Denoising

被引:0
作者
Thi Thu Trang Phung [1 ]
Thi Hong Thu Ma [2 ]
Van Truong Nguyen [3 ]
Duc Quang Vu [4 ]
机构
[1] Thai Nguyen Univ, Thai Nguyen, Vietnam
[2] Tan Trao Univ, Tuyen Quang, Vietnam
[3] Thai Nguyen Univ Educ, Thai Nguyen, Vietnam
[4] Natl Cent Univ, Dept CSIE, Taoyuan, Taiwan
来源
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021) | 2021年
关键词
D O I
10.1109/RIVF51545.2021.9642129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A new approach, such as self-supervised learning of which labels can be automatically generated, is essential. Therefore, using self-supervised learning is a new approach to state-of-the-art methods. In this paper, we introduce a new self-supervised method namely video denoising. This method requires an autoencoder model to restore original videos. The second model is proposed, which is called the discriminator. It is used for the quality evaluation of output videos from the autoencoder. By reconstructing videos, the autoencoder is learned both spatial and temporal relations of video frames to process the downstream task easily. In the experiments, we have demonstrated that our model is well transferred to the action recognition task and outperforms state-of-the-art methods on the UCF-101 and HMDB-51 datasets.
引用
收藏
页码:76 / 81
页数:6
相关论文
共 26 条
[11]   Image-to-Image Translation with Conditional Adversarial Networks [J].
Isola, Phillip ;
Zhu, Jun-Yan ;
Zhou, Tinghui ;
Efros, Alexei A. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976
[12]  
Kim D, 2019, AAAI CONF ARTIF INTE, P8545
[13]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[14]   Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification [J].
Misra, Ishan ;
Zitnick, C. Lawrence ;
Hebert, Martial .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :527-544
[15]   Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles [J].
Noroozi, Mehdi ;
Favaro, Paolo .
COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 :69-84
[16]  
Kingma DP, 2014, Arxiv, DOI [arXiv:1312.6114, DOI 10.48550/ARXIV.1312.6114]
[17]   Context Encoders: Feature Learning by Inpainting [J].
Pathak, Deepak ;
Krahenbuhl, Philipp ;
Donahue, Jeff ;
Darrell, Trevor ;
Efros, Alexei A. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2536-2544
[18]  
Scovanner P., 2007, P 15 ACM INT C MULT, P357
[19]  
Soomro K., 2012, CRCVTR1201, V1212, P0402
[20]  
Tian Y., 2020, Contrastive multiview coding