Integration of Global and Local Knowledge for Foreground Enhancing in Weakly Supervised Temporal Action Localization

被引:0
|
作者
Zhang, Tianyi [1 ]
Li, Ronglu [2 ]
Feng, Pengming [3 ]
Zhang, Rubo [2 ]
机构
[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China
[2] Dalian Minzu Univ, Coll Mech & Elect Engn, Dalian 116600, Peoples R China
[3] CAST, State Key Lab Space Ground Integrated Informat Tec, Beijing 100095, Peoples R China
基金
中国国家自然科学基金;
关键词
Weakly supervised learning; temporal action localization; video content analysis; EVENT DETECTION;
D O I
10.1109/TMM.2024.3379887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Weakly Supervised Temporal Action Localization (WTAL) aims to identify the temporal duration of actions and classify the action categories with only video-level labels in the training stage. Motivated by the intuition that the attention maps generated from various views will assist in enhancing the foreground action temporal segments, in this paper we propose a WTAL pipeline based on a novel attention mechanism that effectively integrates global and local knowledge. Our attention mechanism is mainly composed of a global attention branch and a local attention branch. Specifically, the global attention branch is built on the inter-segment similarity to sparsely mine out the correlation knowledge within the entire video, while the local attention branch is built on the convolutional structure to densely aggregate the information within the fixed local respective field. Experiments on THUMOS14 and ActivityNet v1.3 datasets demonstrate the effectiveness of our proposed WTAL pipeline compared to state-of-the-art methods.
引用
收藏
页码:8476 / 8487
页数:12
相关论文
共 50 条
  • [1] Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7982 - 7991
  • [2] Collaborative Foreground, Background, and Action Modeling Network for Weakly Supervised Temporal Action Localization
    Moniruzzaman, Md.
    Yin, Zhaozheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6939 - 6951
  • [3] GLNet: Global Local Network for Weakly Supervised Action Localization
    Zhang, Shiwei
    Song, Lin
    Gao, Changxin
    Sang, Nong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2610 - 2622
  • [4] GLNet: Global Local Network for Weakly Supervised Action Localization
    Zhang, Shiwei
    Song, Lin
    Gao, Changxin
    Sang, Nong
    Sang, Nong (nsang@hust.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (22): : 2610 - 2622
  • [5] Deep feature enhancing and selecting network for weakly supervised temporal action localization
    Yu, Jiaruo
    Ge, Yongxin
    Qin, Xiaolei
    Li, Ziqiang
    Huang, Sheng
    Chen, Feiyu
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
  • [6] Weakly supervised foreground learning for weakly supervised localization and detection
    Zhang, Chen -Lin
    Li, Yin
    Wu, Jianxin
    PATTERN RECOGNITION, 2023, 137
  • [7] Weakly supervised temporal action localization: a survey
    Li, Ronglu
    Zhang, Tianyi
    Zhang, Rubo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 78361 - 78386
  • [8] Temporal Dropout for Weakly Supervised Action Localization
    Xie, Chi
    Zhuang, Zikun
    Zhao, Shengjie
    Liang, Shuang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [9] Action Shuffling for Weakly Supervised Temporal Localization
    Zhang, Xiao-Yu
    Shi, Haichao
    Li, Changsheng
    Shi, Xinchu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4447 - 4457
  • [10] Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3262 - 3271