Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

被引：166

作者：

Shi, Yemin ^{[1
]}

Tian, Yonghong ^{[1
]}

Wang, Yaowei ^{[2
]}

Huang, Tiejun ^{[1
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci, Cooperat Medianet Innovat Ctr, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China

[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2017年 / 19卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Action recognition; sequential deep trajectory descriptor (sDTD); three-stream framework; long-term motion;

D O I：

10.1109/TMM.2017.2666540

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential deep trajectory descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion, and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51, and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.

引用

页码：1510 / 1520

页数：11

共 58 条

[11]

[Anonymous], 2014, CORR

[12]

[Anonymous], 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME)

[13]

[Anonymous], 2015, CORR

[14] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[15]

Bilinski P., 2013, P 2013 10 IEEE INT C, P1

[16]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[17] Multi-View Super Vector for Action Recognition [J].

Cai, Zhuowei ;

Wang, Limin ;

Peng, Xiaojiang ;

Qiao, Yu .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :596-603

[18] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[19] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[20] Human detection using oriented histograms of flow and appearance [J].

Dalal, Navneet ;

Triggs, Bill ;

Schmid, Cordelia .

COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441

← 1 2 3 4 5 6 →