Graph Distillation for Action Detection with Privileged Modalities

被引:61
作者
Luo, Zelun [1 ,2 ]
Hsieh, Jun-Ting [1 ]
Jiang, Lu [2 ]
Niebles, Juan Carlos [1 ,2 ]
Fei-Fei, Li [1 ,2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Google Inc, Mountain View, CA 94043 USA
来源
COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷
关键词
D O I
10.1007/978-3-030-01264-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://alan.vision/eccv18_graph/.
引用
收藏
页码:174 / 192
页数:19
相关论文
共 64 条
[1]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[2]   SST: Single-Stream Temporal Action Proposals [J].
Buch, Shyamal ;
Escorcia, Victor ;
Shen, Chuanqi ;
Ghanem, Bernard ;
Niebles, Juan Carlos .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]  
Caruana R, 1998, LEARNING TO LEARN, P95, DOI 10.1007/978-1-4615-5529-2_5
[5]   Webly Supervised Learning of Convolutional Networks [J].
Chen, Xinlei ;
Gupta, Abhinav .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1431-1439
[6]  
Chung J., 2014, ARXIV
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   Missing Modality Transfer Learning via Latent Low-Rank Constraint [J].
Ding, Zhengming ;
Shao, Ming ;
Fu, Yun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :4322-4334
[9]   REConvertor: Transforming Textual Use Cases to High-Level Message Sequence Chart [J].
Ding, Zuohua ;
Shuai, Tiantian ;
Jiang, Mingyue .
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C), 2017, :610-611
[10]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497