Collaborative knowledge distillation for incomplete multi-view action prediction

被引:6
作者
Kumar, Deepak [1 ]
Kumar, Chetan [1 ]
Shao, Ming [1 ]
机构
[1] Univ Massachusetts, 285 Old Westport Rd, Dartmouth, MA 02747 USA
关键词
Multi-view; Action prediction; Knowledge distillation; Graph attention;
D O I
10.1016/j.imavis.2021.104111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting future actions is a key in visual understanding, surveillance, and human behavior analysis. Current methods for video-based prediction are primarily using single-view data, while in the real world multiple cameras and produced videos are readily available, which may potentially benefit the action prediction tasks. However, it may bring up a new challenge: subjects in the videos are more likely to be occluded by objects when captured from different angles, or suffer from signal jittering in transmission. To that end, in this paper we propose a novel student network called Collaborative Knowledge Distillation (CKD) to predict human actions with missing information under a multi-view setting, i.e., incomplete multi-view action prediction. First, we create a graph attention based teacher model capable of fusing multi-view video features for prediction task. Second, we construct a corruption pattern bank (CPB) to simulate various missing segments in multi-view video, and each student model will manage one pattern through privileged information and knowledge distillation. Third, to account for arbitrary missing video segments in real-world, the ensemble of student models will be developed to make a joint prediction. The proposed framework has been extensively evaluated on popular multi-view visual action datasets, including PKU-MMD and NTU-RGB to validate the effectiveness of our approach and to the best of our knowledge action prediction has not yet been explored in the multi-view setting. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 76 条
[1]   Encouraging LSTMs to Anticipate Actions Very Early [J].
Aliakbarian, Mohammad Sadegh ;
Saleh, Fatemeh Sadat ;
Salzmann, Mathieu ;
Fernando, Basura ;
Petersson, Lars ;
Andersson, Lars .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :280-289
[2]  
Andrienko G., 2013, Introduction, P1
[3]  
[Anonymous], 2019, ICLR
[4]  
Bahdanau D., 2014, ARXIV PREPRINT ARXIV
[5]   HP-GAN: Probabilistic 3D human motion prediction via GAN [J].
Barsoum, Emad ;
Kender, John ;
Liu, Zicheng .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :1499-1508
[6]   Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty [J].
Bhattacharyya, Apratim ;
Fritz, Mario ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4194-4202
[7]   Recognize Human Activities from Partially Observed Videos [J].
Cao, Yu ;
Barrett, Daniel ;
Barbu, Andrei ;
Narayanaswamy, Siddharth ;
Yu, Haonan ;
Michaux, Aaron ;
Lin, Yuewei ;
Dickinson, Sven ;
Siskind, Jeffrey Mark ;
Wang, Song .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2658-2665
[8]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[9]  
Chowdhury S., 2019, ARXIV PREPRINT ARXIV
[10]   An Overview of Low-Rank Matrix Recovery From Incomplete Observations [J].
Davenport, Mark A. ;
Romberg, Justin .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (04) :608-622