Self-Supervised Feature Enhancement: Applying Internal Pretext Task to Supervised Learning

被引:1
作者
Xie, Tianshu [1 ]
Yang, Yuhang [2 ]
Ding, Zilin [2 ]
Cheng, Xuan [2 ]
Wang, Xiaomin [2 ]
Gong, Haigang [2 ]
Liu, Ming [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Quzhou 324003, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Wenzhou Med Univ, Quzhou Affiliated Hosp, Quzhou Peoples Hosp, Quzhou 324000, Peoples R China
关键词
Task analysis; Training; Self-supervised learning; Visualization; Supervised learning; Semantics; Predictive models; Deep learning; classification; self-supervised learning; convolutional neural network; feature transformation;
D O I
10.1109/ACCESS.2022.3233104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional self-supervised learning requires convolutional neural networks (CNNs) using external pretext tasks (i.e., image- or video-based tasks) to encode high-level semantic visual representations. In this paper, we show that feature transformations within CNNs can also be regarded as supervisory signals to construct the self-supervised task, called internal pretext task. And such a task can be applied for the enhancement of supervised learning. Specifically, we first transform the internal feature maps by discarding different channels, and then define an additional internal pretext task to identify the discarded channels. CNNs are trained to predict the joint labels generated by the combination of self-supervised labels and original labels. By doing so, we let CNNs know which channels are missing while classifying in the hope to mine richer feature information. Extensive experiments show that our approach is effective on various models and datasets. And it's worth noting that we only incur negligible computational overhead. Furthermore, our approach can also be compatible with other methods to get better results.
引用
收藏
页码:1708 / 1717
页数:10
相关论文
共 33 条
[1]   Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition [J].
Ahsan, Unaiza ;
Madhok, Rishi ;
Essa, Irfan .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :179-189
[2]  
[Anonymous], P CVPR WORKSH FIN GR
[3]   Self-Supervised GANs via Auxiliary Rotation Loss [J].
Chen, Ting ;
Zhai, Xiaohua ;
Ritter, Marvin ;
Lucic, Mario ;
Houlsby, Neil .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12146-12155
[4]  
DeVries T, 2017, Arxiv, DOI arXiv:1708.04552
[5]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[6]  
Gidaris S, 2018, Arxiv, DOI arXiv:1803.07728
[7]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[8]   Deep Pyramidal Residual Networks [J].
Han, Dongyoon ;
Kim, Jiwhan ;
Kim, Junmo .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6307-6315
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, 10.48550/arXiv.1503.02531]