Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization

被引:178
作者
Liu, Daochang [1 ]
Jiang, Tingting [1 ]
Wang, Yizhou [1 ,2 ,3 ]
机构
[1] Peking Univ, Sch EECS, Cooperat Medianet Innovat Ctr, NELVT, Beijing, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China
[3] Deepwise AI Lab, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action localization is crucial for understanding untrimmed videos. In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation. Then by presenting a novel network architecture and its training strategy, the two problems are explicitly looked into. Specifically, to model the completeness of actions, we propose a multi-branch neural network in which branches are enforced to discover distinctive action parts. Complete actions can be therefore localized by fusing activations from different branches. And to separate action instances from their surrounding context, we generate hard negative data for training using the prior that motionless video clips are unlikely to be actions. Experiments performed on datasets THUMOS'14 and ActivityNet show that our framework outperforms state-of-the-art methods. In particular, the average mAP on ActivityNet v1.2 is significantly improved from 18.0% to 22.4%. Our code will be released soon.
引用
收藏
页码:1298 / 1307
页数:10
相关论文
共 54 条
  • [1] Human Activity Analysis: A Review
    Aggarwal, J. K.
    Ryoo, M. S.
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [2] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
    Alwassel, Humam
    Heilbron, Fabian Caba
    Ghanem, Bernard
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 253 - 269
  • [3] [Anonymous], IEEE C COMP VIS PATT
  • [4] [Anonymous], 2015, IEEE C COMP VIS PATT
  • [5] A survey on deep learning based approaches for action and gesture recognition in image sequences
    Asadi-Aghbolaghi, Maryam
    Clapes, Albert
    Bellantonio, Marco
    Escalante, Hugo Jair
    Ponce-Lopez, Victor
    Baro, Xavier
    Guyon, Isabelle
    Kasaei, Shohreh
    Escalera, Sergio
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 476 - 483
  • [6] Buch S., 2017, P BRIT MACH VIS C BM
  • [7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [8] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
    Chao, Yu-Wei
    Vijayanarasimhan, Sudheendra
    Seybold, Bryan
    Ross, David A.
    Deng, Jia
    Sukthankar, Rahul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
  • [9] Courtney PG, 2015, IEEE COMP SEMICON
  • [10] Temporal Context Network for Activity Localization in Videos
    Dai, Xiyang
    Singh, Bharat
    Zhang, Guyue
    Davis, Larry S.
    Chen, Yan Qiu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5727 - 5736