Weakly Supervised Action Localization by Sparse Temporal Pooling Network

被引：283

作者：

Phuc Nguyen ^{[1
]}

Liu, Ting ^{[2
]}

Prasad, Gautam ^{[2
]}

Han, Bohyung ^{[3
]}

机构：

[1] Univ Calif Irvine, Irvine, CA 92697 USA

[2] Google, Venice, CA USA

[3] Seoul Natl Univ, Seoul, South Korea

来源：

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年

关键词：

D O I：

10.1109/CVPR.2018.00706

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.

引用

页码：6752 / 6761

页数：10