Weakly Supervised Action Localization by Sparse Temporal Pooling Network

被引:283
作者
Phuc Nguyen [1 ]
Liu, Ting [2 ]
Prasad, Gautam [2 ]
Han, Bohyung [3 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
[2] Google, Venice, CA USA
[3] Seoul Natl Univ, Seoul, South Korea
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
关键词
D O I
10.1109/CVPR.2018.00706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.
引用
收藏
页码:6752 / 6761
页数:10
相关论文
共 50 条
[1]  
Alwassel H., 2017, ARXIV170604269
[2]  
[Anonymous], 2017, ARXIV170508421
[3]  
[Anonymous], 2017, CVPR
[4]  
[Anonymous], CORR
[5]  
[Anonymous], 2016, ECCV
[6]  
[Anonymous], 2016, ECCV
[7]  
[Anonymous], 2016, CVPR
[8]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[9]  
[Anonymous], 2016, CVPR
[10]  
[Anonymous], 2015, P IEEE INT C COMPUTE