Actions as Moving Points

被引:99
作者
Li, Yixuan [1 ]
Wang, Zixu [1 ]
Wang, Limin [1 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XVI | 2020年 / 12361卷
基金
美国国家科学基金会;
关键词
Spatio-temporal action detection; Anchor-free detection;
D O I
10.1007/978-3-030-58517-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization. In this paper, we present a conceptually simple, computationally efficient, and more precise action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points. Based on the insight that movement information could simplify and assist action tubelet detection, our MOC-detector is composed of three crucial head branches: (1) Center Branch for instance center detection and action recognition, (2) Movement Branch for movement estimation at adjacent frames to form trajectories of moving points, (3) Box Branch for spatial extent detection by directly regressing bounding box size at each estimated center. These three branches work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our MOC-detector outperforms the existing state-of-the-art methods for both metrics of frame-mAP and video-mAP on the JHMDB and UCF101-24 datasets. The performance gap is more evident for higher video IoU, demonstrating that our MOC-detector is particularly effective for more precise action detection. We provide the code at https://github.com/MCG-NJU/MOC-Detector.
引用
收藏
页码:68 / 84
页数:17
相关论文
共 41 条
[11]   Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos [J].
Hou, Rui ;
Chen, Chen ;
Shah, Mubarak .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5823-5832
[12]   A survey on visual surveillance of object motion and behaviors [J].
Hu, WM ;
Tan, TN ;
Wang, L ;
Maybank, S .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2004, 34 (03) :334-352
[13]   Towards understanding action recognition [J].
Jhuang, Hueihan ;
Gall, Juergen ;
Zuffi, Silvia ;
Schmid, Cordelia ;
Black, Michael J. .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3192-3199
[14]   Action Tubelet Detector for Spatio-Temporal Action Localization [J].
Kalogeiton, Vicky ;
Weinzaepfel, Philippe ;
Ferrari, Vittorio ;
Schmid, Cordelia .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4415-4423
[15]   CornerNet: Detecting Objects as Paired Keypoints [J].
Law, Hei ;
Deng, Jia .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :765-781
[16]   Recurrent Tubelet Proposal and Recognition Networks for Action Detection [J].
Li, Dong ;
Qiu, Zhaofan ;
Dai, Qi ;
Yao, Ting ;
Mei, Tao .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :306-322
[17]   Focal Loss for Dense Object Detection [J].
Lin, Tsung-Yi ;
Goyal, Priya ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2999-3007
[18]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[19]   SSD: Single Shot MultiBox Detector [J].
Liu, Wei ;
Anguelov, Dragomir ;
Erhan, Dumitru ;
Szegedy, Christian ;
Reed, Scott ;
Fu, Cheng-Yang ;
Berg, Alexander C. .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37
[20]  
Oh SM, 2011, PROC CVPR IEEE