Violence Detection Based on Spatio-Temporal Feature and Fisher Vector

被引:1
作者
Cai, Huangkai [1 ]
Jiang, He [1 ]
Huang, Xiaolin [1 ]
Yang, Jie [1 ]
He, Xiangjian [2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai, Peoples R China
[2] Univ Technol Sydney, Sch Elect & Data Engn, Ultimo, Australia
来源
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I | 2018年 / 11256卷
关键词
Violence detection; Dense Trajectories; MPEG flow video descriptor; Fisher Vector; Linear support vector machine;
D O I
10.1007/978-3-030-03398-9_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel framework based on local spatio-temporal features and a Bag-of-Words (BoW) model is proposed for violence detection. The framework utilizes Dense Trajectories (DT) and MPEG flow video descriptor (MF) as feature descriptors and employs Fisher Vector (FV) in feature coding. DT and MF algorithms are more descriptive and robust, because they are combinations of various feature descriptors, which describe trajectory shape, appearance, motion and motion boundary, respectively. FV is applied to transform low level features to high level features. FV method preserves much information, because not only the affiliations of descriptors are found in the codebook, but also the first and second order statistics are used to represent videos. Some tricks, that PCA, K-means++ and codebook size, are used to improve the final performance of video classification. In comprehensive consideration of accuracy, speed and application scenarios, the proposed method for violence detection is analysed. Experimental results show that the proposed approach outperforms the state-of-the-art approaches for violence detection in both crowd scenes and non-crowd scenes.
引用
收藏
页码:180 / 190
页数:11
相关论文
共 21 条
[1]  
[Anonymous], 2009, ANN PHARMACOTHER
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]  
Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[6]  
Hassner T., 2012, Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, P1, DOI DOI 10.1109/CVPRW.2012.6239348
[7]  
Jaakkola TS, 1999, ADV NEUR IN, V11, P487
[8]   Efficient feature extraction, encoding and classification for action recognition [J].
Kantorov, Vadim ;
Laptev, Ivan .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2593-2600
[9]   Learning realistic human actions from movies [J].
Laptev, Ivan ;
Marszalek, Marcin ;
Schmid, Cordelia ;
Rozenfeld, Benjamin .
2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :3222-+
[10]   A randomized algorithm for the decomposition of matrices [J].
Martinsson, Per-Gunnar ;
Rokhlin, Vladimir ;
Tygert, Mark .
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2011, 30 (01) :47-68