Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

被引:74
作者
Ahsan, Unaiza [1 ]
Madhok, Rishi [2 ]
Essa, Irfan [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2019年
关键词
D O I
10.1109/WACV.2019.00025
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition. Recent self-supervised approaches have used spatial context [9, 34] as well as temporal coherency [32] but a combination of the two requires extensive preprocessing such as tracking objects through millions of video frames [59] or computing optical flow to determine frame regions with high motion [30]. We propose to combine spatial and temporal context in one self-supervised framework without any heavy preprocessing. We divide multiple video frames into grids of patches and train a network to solve jigsaw puzzles on these patches from multiple frames. So the network is trained to correctly identify the position of a patch within a video frame as well as the position of a patch over time. We also propose a novel permutation strategy that outperforms random permutations while significantly reducing computational and memory constraints. We use our trained network for transfer learning tasks such as video activity recognition and demonstrate the strength of our approach on two benchmark video action recognition datasets without using a single frame from these datasets for unsupervised pretraining of our proposed video jigsaw network.
引用
收藏
页码:179 / 189
页数:11
相关论文
共 50 条
[41]   Action recognition on continuous video [J].
Chang, Y. L. ;
Chan, C. S. ;
Remagnino, P. .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04) :1233-1243
[42]   Human Action Recognition in Video [J].
Singh, Dushyant Kumar .
ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 :54-66
[43]   Compressed Video Action Recognition [J].
Wu, Chao-Yuan ;
Zaheer, Manzil ;
Hu, Hexiang ;
Manmatha, R. ;
Smola, Alexander J. ;
Krahenbuhl, Philipp .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6026-6035
[44]   Action recognition on continuous video [J].
Y. L. Chang ;
C. S. Chan ;
P. Remagnino .
Neural Computing and Applications, 2021, 33 :1233-1243
[45]   Weakly supervised graph learning for action recognition in untrimmed video [J].
Yao, Xiao ;
Zhang, Jia ;
Chen, Ruixuan ;
Zhang, Dan ;
Zeng, Yifeng .
VISUAL COMPUTER, 2023, 39 (11) :5469-5483
[46]   Self-Supervised Learning for Action Recognition by Video Denoising [J].
Thi Thu Trang Phung ;
Thi Hong Thu Ma ;
Van Truong Nguyen ;
Duc Quang Vu .
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, :76-81
[47]   An efficient motion visual learning method for video action recognition [J].
Wang, Bin ;
Chang, Faliang ;
Liu, Chunsheng ;
Wang, Wenqian ;
Ma, Ruiyi .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[48]   Dense Unsupervised Learning for Video Segmentation [J].
Araslanov, Nikita ;
Schaub-Meyer, Simone ;
Roth, Stefan .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[49]   Weakly supervised graph learning for action recognition in untrimmed video [J].
Xiao Yao ;
Jia Zhang ;
Ruixuan Chen ;
Dan Zhang ;
Yifeng Zeng .
The Visual Computer, 2023, 39 :5469-5483
[50]   Machine Learning for Video Action Recognition: a Computer Vision Approach [J].
Labayen, Mikel ;
Aginako, Naiara ;
Sierra, Basilio ;
Olaizola, Igor G. ;
Florez, Julian .
2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, :683-690