Multi-task CNN Model for Action Detection

被引：0

作者：

Chen, Xin ^{[1
]}

Han, Yahong ^{[1
]}

机构：

[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP) | 2018年

关键词：

Action detection; feature fusion; yolo; 3D convolutional neural networks; multi-task learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action detection is a challenging task since it requires locating actions of interest in both spatial and temporal. In this paper, a multi-task cnn model (MTCNN) which employs both spatial and temporal modules is proposed to solve this task. Specifically, the spatial module fuses appearance and motion information of frames which helps to regress the action bounding boxes in every frame more accurately, while the temporal module utilizes the 3D ConvNet which can effectively capture the temporal correlation between frames thus predict the time interval of action more precisely. Moreover, these two modules share information before their final outputs and are trained simultaneously. Experiments on UCF101-24 and J-HMDB-21 datasets demonstrate that our proposed pipeline outperforms most state-of-the-art methods.

引用

页数：4

共 18 条

[1] [Anonymous], 2017, ICCV
[2] High accuracy optical flow estimation based on a theory for warping
Brox, T
Bruhn, A
Papenberg, N
Weickert, J
[J]. COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 : 25 - 36
[3] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[4] Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676
[5] Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Hou, Rui
Chen, Chen
Shah, Mubarak
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5823 - 5832
[6] Towards understanding action recognition
Jhuang, Hueihan
Gall, Juergen
Zuffi, Silvia
Schmid, Cordelia
Black, Michael J.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3192 - 3199
[7] Joseph RK, 2016, CRIT POL ECON S ASIA, P1
[8] SSD: Single Shot MultiBox Detector
Liu, Wei
Anguelov, Dragomir
Erhan, Dumitru
Szegedy, Christian
Reed, Scott
Fu, Cheng-Yang
Berg, Alexander C.
[J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 21 - 37
[9] Multi-region Two-Stream R-CNN for Action Detection
Peng, Xiaojiang
Schmid, Cordelia
[J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 744 - 759
[10] HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
Ranjan, Rajeev
Patel, Vishal M.
Chellappa, Rama
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (01) : 121 - 135

← 1 2 →