Multi-task CNN Model for Action Detection

被引:0
作者
Chen, Xin [1 ]
Han, Yahong [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP) | 2018年
关键词
Action detection; feature fusion; yolo; 3D convolutional neural networks; multi-task learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action detection is a challenging task since it requires locating actions of interest in both spatial and temporal. In this paper, a multi-task cnn model (MTCNN) which employs both spatial and temporal modules is proposed to solve this task. Specifically, the spatial module fuses appearance and motion information of frames which helps to regress the action bounding boxes in every frame more accurately, while the temporal module utilizes the 3D ConvNet which can effectively capture the temporal correlation between frames thus predict the time interval of action more precisely. Moreover, these two modules share information before their final outputs and are trained simultaneously. Experiments on UCF101-24 and J-HMDB-21 datasets demonstrate that our proposed pipeline outperforms most state-of-the-art methods.
引用
收藏
页数:4
相关论文
共 18 条
  • [1] [Anonymous], 2017, ICCV
  • [2] High accuracy optical flow estimation based on a theory for warping
    Brox, T
    Bruhn, A
    Papenberg, N
    Weickert, J
    [J]. COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 : 25 - 36
  • [3] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [4] Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676
  • [5] Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
    Hou, Rui
    Chen, Chen
    Shah, Mubarak
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5823 - 5832
  • [6] Towards understanding action recognition
    Jhuang, Hueihan
    Gall, Juergen
    Zuffi, Silvia
    Schmid, Cordelia
    Black, Michael J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3192 - 3199
  • [7] Joseph RK, 2016, CRIT POL ECON S ASIA, P1
  • [8] SSD: Single Shot MultiBox Detector
    Liu, Wei
    Anguelov, Dragomir
    Erhan, Dumitru
    Szegedy, Christian
    Reed, Scott
    Fu, Cheng-Yang
    Berg, Alexander C.
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 21 - 37
  • [9] Multi-region Two-Stream R-CNN for Action Detection
    Peng, Xiaojiang
    Schmid, Cordelia
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 744 - 759
  • [10] HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
    Ranjan, Rajeev
    Patel, Vishal M.
    Chellappa, Rama
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (01) : 121 - 135