Multi-region Two-Stream R-CNN for Action Detection

被引：243

作者：

Peng, Xiaojiang ^{[1
]}

Schmid, Cordelia ^{[1
]}

机构：

[1] Inria, Thoth Team, Lab Jean Kuntzmann, Grenoble, France

来源：

COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷

关键词：

Action detection; Faster R-CNN; Multi-region CNNs; Two stream R-CNN; ACTION RECOGNITION; LOCALIZATION;

D O I：

10.1007/978-3-319-46493-0_45

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a multi-region two-stream R-CNN model for action detection in realistic videos. We start from frame-level action detection based on faster R-CNN, and make three contributions: (1) we show that a motion region proposal network generates high-quality proposals, which are complementary to those of an appearance region proposal network; (2) we show that stacking optical flow over several frames significantly improves frame-level action detection; and (3) we embed a multi-region scheme in the faster R-CNN model, which adds complementary information on body parts. We then link frame-level detections with the Viterbi algorithm, and temporally localize an action with the maximum subarray method. Experimental results on the UCF-Sports, J-HMDB and UCF101 action detection datasets show that our approach outperforms the state of the art with a significant margin in both frame-mAP and video-mAP.

引用

页码：744 / 759

页数：16

共 44 条

[1] Human Activity Analysis: A Review [J].

Aggarwal, J. K. ;

Ryoo, M. S. .

ACM COMPUTING SURVEYS, 2011, 43 (03)

[2]

[Anonymous], PROC CVPR IEEE

[3]

[Anonymous], P INT C NEUR INF PRO

[4]

[Anonymous], 2015, Advances in Neural Information Processing Systems, DOI DOI 10.1109/TPAMI.2016.2577031

[5]

[Anonymous], 2012, CoRR

[6]

Bentley J., 1984, Communications of the ACM, V27, P865, DOI 10.1145/358234.381162

[7] Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations [J].

Bourdev, Lubomir ;

Malik, Jitendra .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :1365-1372

[8] High accuracy optical flow estimation based on a theory for warping [J].

Brox, T ;

Bruhn, A ;

Papenberg, N ;

Weickert, J .

COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 :25-36

[9] P-CNN: Pose-based CNN Features for Action Recognition [J].

Cheron, Guilhem ;

Laptev, Ivan ;

Schmid, Cordelia .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226

[10]

Dai JF, 2015, PROC CVPR IEEE, P3992, DOI 10.1109/CVPR.2015.7299025

← 1 2 3 4 5 →