SMC: Single-Stage Multi-location Convolutional Network for Temporal Action Detection

被引：0

作者：

Liu, Zhikang ^{[1
,2
]}

Wang, Zilei ^{[1
]}

Zhao, Yan ^{[1
]}

Tian, Ye ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei, Anhui, Peoples R China

[2] Megvii Inc Face, Beijing, Peoples R China

来源：

COMPUTER VISION - ACCV 2018, PT II | 2019年 / 11362卷

基金：

中国国家自然科学基金;

关键词：

Temporal action detection; End-to-end; Multi-scale; SMC;

D O I：

10.1007/978-3-030-20890-5_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action detection in untrimmed videos is an important and challenging visual task. State-of-the-art works always adopt a multi-stage pipeline, i.e., a class-agnostic segment proposal followed by a multi-label action classification. This pipeline is computationally slow and hard to optimize as each stage need be trained separately. Moreover, a desirable method should go beyond segment-level localization and make dense predictions with precise boundaries. We introduce a novel detection model in this paper, Single-stage Multi-location Convolutional Network (SMC), which completely eliminates the proposal generation and spatio-temporal feature resampling, and predicts frame-level action locations with class probabilities in a unified end-to-end network. Specifically, we associate a set of multi-scale default locations with each feature map cell in multiple layers, then predict the location offsets to the default locations, as well as action categories. SMC in practice is faster than the existing methods (753 FPS on a Titan X Maxwell GPU) and achieves state-of-the-art performance on THUMOS'14 and MEXaction2.

引用

页码：179 / 195

页数：17

共 47 条

[1] [Anonymous], 2017, CVPR
[2] [Anonymous], 2017, BMVC
[3] [Anonymous], 2016, NIPS
[4] [Anonymous], 2015, CVPR
[5] [Anonymous], WACV
[6] [Anonymous], 2015, CVPR
[7] [Anonymous], 2015, CVPR
[8] [Anonymous], 2016, CVPR
[9] [Anonymous], 2017, CVPR
[10] [Anonymous], 2018, CVPR

← 1 2 3 4 5 →