Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

被引:174
作者
Bagautdinov, Timur [1 ]
Alahi, Alexandre [2 ]
Fleuret, Francois [1 ,3 ]
Fua, Pascal [1 ]
Savarese, Silvio [2 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] Stanford Univ, Stanford, CA 94305 USA
[3] IDIAP Res Inst, Martigny, Switzerland
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
瑞士国家科学基金会;
关键词
PEOPLE;
D O I
10.1109/CVPR.2017.365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.
引用
收藏
页码:3425 / 3434
页数:10
相关论文
共 45 条
[1]  
ABADI M, 2015, TENSORFLOW LARGE SCA, DOI DOI 10.48550/ARXIV.1605.08695
[2]  
Amer MR, 2014, LECT NOTES COMPUT SC, V8694, P572, DOI 10.1007/978-3-319-10599-4_37
[3]  
[Anonymous], 2015, PROC CVPR IEEE, DOI 10.1109/CVPR.2015.7298642
[4]  
Bagautdinov T, 2015, PROC CVPR IEEE, P2829, DOI 10.1109/CVPR.2015.7298900
[5]  
Baque P., 2016, IEEE C COMP VIS PATT
[6]   On Detection of Multiple Object Instances Using Hough Transforms [J].
Barinova, Olga ;
Lempitsky, Victor ;
Kholi, Pushmeet .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (09) :1773-1784
[7]   Understanding Collective Activities of People from Videos [J].
Choi, Wongun ;
Savarese, Silvio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (06) :1242-1257
[8]  
Choi W, 2012, LECT NOTES COMPUT SC, V7575, P215, DOI 10.1007/978-3-642-33765-9_16
[9]  
Chung Junyoung, 2014, Empirical evaluation of gated recurrent neural networks on sequence modeling
[10]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893