Ensemble Prototype Network For Weakly Supervised Temporal Action Localization

被引:2
|
作者
Wu, Kewei [1 ]
Luo, Wenjie [1 ]
Xie, Zhao [1 ]
Guo, Dan [1 ,2 ,3 ]
Zhang, Zhao [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Sch Comp & Informat, Hefei 230009, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230039, Peoples R China
[3] Anhui Zhonghuitong Technol Co Ltd, Hefei 230088, Peoples R China
关键词
Prototypes; Location awareness; Ensemble learning; Annotations; Proposals; Feature extraction; Training; Consensus-aware clustering; ensemble learning; prototype learning; weakly supervised temporal action localization (TAL);
D O I
10.1109/TNNLS.2024.3377468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal action localization (TAL) aims to localize the action instances in untrimmed videos using only video-level action labels. Without snippet-level labels, this task should be hard to distinguish all snippets with accurate action/background categories. The main difficulties are the large variations brought by the unconstraint background snippets and multiple subactions in action snippets. The existing prototype model focuses on describing snippets by covering them with clusters (defined as prototypes). In this work, we argue that the clustered prototype covering snippets with simple variations still suffers from the misclassification of the snippets with large variations. We propose an ensemble prototype network (EPNet), which ensembles prototypes learned with consensus-aware clustering. The network stacks a consensus prototype learning (CPL) module and an ensemble snippet weight learning (ESWL) module as one stage and extends one stage to multiple stages in an ensemble learning way. The CPL module learns the consensus matrix by estimating the similarity of clustering labels between two successive clustering generations. The consensus matrix optimizes the clustering to learn consensus prototypes, which can predict the snippets with consensus labels. The ESWL module estimates the weights of the misclassified snippets using the snippet-level loss. The weights update the posterior probabilities of the snippets in the clustering to learn prototypes in the next stage. We use multiple stages to learn multiple prototypes, which can cover the snippets with large variations for accurate snippet classification. Extensive experiments show that our method achieves the state-of-the-art weakly supervised TAL methods on two benchmark datasets, that is, THUMOS'14, ActivityNet v1.2, and ActivityNet v1.3 datasets.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] Ensemble Prototype Network For Weakly Supervised Temporal Action Localization
    Wu, Kewei
    Luo, Wenjie
    Xie, Zhao
    Guo, Dan
    Zhang, Zhao
    Hong, Richang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 4560 - 4574
  • [2] Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization
    Shao, Yuxiang
    Zhang, Feifei
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6717 - 6729
  • [3] ACTION COHERENCE NETWORK FOR WEAKLY SUPERVISED TEMPORAL ACTION LOCALIZATION
    Zhai, Yuanhao
    Wang, Le
    Liu, Ziyi
    Zhang, Qilin
    Hua, Gang
    Zheng, Nanning
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3696 - 3700
  • [4] Action Coherence Network for Weakly-Supervised Temporal Action Localization
    Zhai, Yuanhao
    Wang, Le
    Tang, Wei
    Zhang, Qilin
    Zheng, Nanning
    Hua, Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1857 - 1870
  • [5] Action Unit Memory Network for Weakly Supervised Temporal Action Localization
    Luo, Wang
    Zhang, Tianzhu
    Yang, Wenfei
    Liu, Jingen
    Mei, Tao
    Wu, Feng
    Zhang, Yongdong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9964 - 9974
  • [6] Complementary Attention Network for Weakly Supervised Temporal Action Localization
    Dou, Peng
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6713 - 6732
  • [7] Relational Prototypical Network for Weakly Supervised Temporal Action Localization
    Huang, Linjiang
    Huang, Yan
    Ouyang, Wanli
    Wang, Liang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11053 - 11060
  • [8] Weakly Supervised Action Localization by Sparse Temporal Pooling Network
    Phuc Nguyen
    Liu, Ting
    Prasad, Gautam
    Han, Bohyung
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6752 - 6761
  • [9] Complementary Attention Network for Weakly Supervised Temporal Action Localization
    Peng Dou
    Haifeng Hu
    Neural Processing Letters, 2023, 55 : 6713 - 6732
  • [10] Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7982 - 7991