Feature Aggregation Tree: Capture Temporal Motion Information for Action Recognition in Videos

被引:0
作者
Zhu, Bing [1 ]
机构
[1] Beijing Inst Technol BIT, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing 100081, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT III | 2018年 / 11258卷
关键词
Action recognition; Feature learning; Feature aggregation;
D O I
10.1007/978-3-030-03338-5_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a model named Feature Aggregation Tree to capture the temporal motion information in videos for action recognition. Feature Aggregation Tree constructs a logical motion sequence by considering the concrete semantics of features and mining feature combinations in a video. It will save different feature combinations and then use the bayesian model to calculate the conditional probabilities of frame-level features based on the previous features to aggregate features. It doesn't matter about the length of the video. Compared with the existing feature aggregation methods that try to enhance the descriptive capacity of features, our model has the following advantages: (i) It considers the temporal motion information in a video, and predicts the conditional probability by using the bayesian model. (ii) It can deal with arbitrary length of the video, rather than uniform sampling or feature encoding. (iii) It is compact and efficient compared to other encoding methods, with significant results compared to baseline methods. Experiments on the UCF101 dataset and HMDB51 dataset demonstrate the effectiveness of our method.
引用
收藏
页码:316 / 327
页数:12
相关论文
共 40 条
[1]  
Aditya P., 2016, MARKET BASKET ANAL U
[2]  
[Anonymous], 2009, BMVC 2009
[3]   Selecting Key Poses on Manifold for Pairwise Action Recognition [J].
Cao, Xianbin ;
Ning, Bo ;
Yan, Pingkun ;
Li, Xuelong .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2012, 8 (01) :168-177
[4]   A Novel Incremental Data Mining Algorithm based on FP-Growth for Big Data [J].
Chang, Hong-Yi ;
Lin, Jia-Chi ;
Cheng, Mei-Li ;
Huang, Shih-Chang .
PROCEEDINGS 2016 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS NANA 2016, 2016, :375-378
[5]  
Dharmaraajan K, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), P170, DOI 10.1109/ICACA.2016.7887945
[6]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[7]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[8]   Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos [J].
Duta, Ionut Cosmin ;
Ionescu, Bogdan ;
Aizawa, Kiyoharu ;
Sebe, Nicu .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3205-3214
[9]  
Fei-Fei L, 2005, PROC CVPR IEEE, P524
[10]   ActionVLAD: Learning spatio-temporal aggregation for action classification [J].
Girdhar, Rohit ;
Ramanan, Deva ;
Gupta, Abhinav ;
Sivic, Josef ;
Russell, Bryan .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3165-3174