A Snippets Relation and Hard-Snippets Mask Network for Weakly-Supervised Temporal Action Localization

被引:0
|
作者
Zhao, Yibo [1 ]
Zhang, Hua [1 ]
Gao, Zan [1 ,2 ]
Guan, Weili [3 ]
Wang, Meng [4 ]
Chen, Shengyong [1 ]
机构
[1] Tianjin Univ Technol, Key Lab Comp Vis & Syst, Minist Educ, Tianjin 300384, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Artificial Intelligence Inst, Jinan 250014, Peoples R China
[3] Monash Univ, Fac Informat Technol, Clayton Campus, Clayton, Vic 3800, Australia
[4] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Location awareness; Task analysis; Proposals; Circuits and systems; Uncertainty; Prototypes; Multitasking; Weakly-supervised temporal action localization; snippets relation module; hard-snippets mask module; snippet enhancement loss; DISTILLATION;
D O I
10.1109/TCSVT.2024.3374870
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Weakly-supervised temporal action localization (WTAL) is a problem learning an action localization model with only video-level labels available. In recent years, many WTAL methods have developed. However, hard-to-predict snippets near action boundaries are often not considered in these existing approaches, causing action incompleteness and action over-complete issues. To solve these issues, in this work, an end-to-end snippets relation and hard-snippets mask network (SRHN) is proposed. Specifically, a hard-snippets mask module is applied to mask the hard-to-predict snippets adaptively, and in this way, the trained model focuses more on those snippets with low uncertainty. Then, a snippets relation module is designed to capture the relationship among snippets and can make hard-to-predict snippets easy to predict by aggregating the information of multiple temporal receptive fields. Finally, a snippet enhancement loss is further developed to reduce the action probabilities that are not present in videos for hard-to-predict snippets and other snippets, enlarging the action probabilities that exist in videos. Extensive experiments on THUMOS14, ActivityNet1.2, and ActivityNet1.3 datasets demonstrate the effectiveness of the SRHN method.
引用
收藏
页码:7202 / 7215
页数:14
相关论文
共 50 条
  • [1] Unleashing the Potential of Adjacent Snippets for Weakly-supervised Temporal Action Localization
    Liu, Qinying
    Wang, Zilei
    Chen, Ruoxi
    Li, Zhilin
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1032 - 1037
  • [2] Unleashing the Potential of Adjacent Snippets for Weakly-supervised Temporal Action Localization
    Liu, Qinying
    Wang, Zilei
    Chen, Ruoxi
    Li, Zhilin
    Proceedings - IEEE International Conference on Multimedia and Expo, 2023, 2023-July : 1032 - 1037
  • [3] Action Coherence Network for Weakly-Supervised Temporal Action Localization
    Zhai, Yuanhao
    Wang, Le
    Tang, Wei
    Zhang, Qilin
    Zheng, Nanning
    Hua, Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1857 - 1870
  • [4] Background Suppression Network for Weakly-Supervised Temporal Action Localization
    Lee, Pilhyeon
    Uh, Youngjung
    Byun, Hyeran
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11320 - 11327
  • [5] Feature Matching Network for Weakly-Supervised Temporal Action Localization
    Dou, Peng
    Zhou, Wei
    Liao, Zhongke
    Hu, Haifeng
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 459 - 471
  • [6] Weakly-supervised temporal action localization: a survey
    AbdulRahman Baraka
    Mohd Halim Mohd Noor
    Neural Computing and Applications, 2022, 34 : 8479 - 8499
  • [7] Weakly-supervised temporal action localization: a survey
    Baraka, AbdulRahman
    Noor, Mohd Halim Mohd
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11): : 8479 - 8499
  • [8] Deep cascaded action attention network for weakly-supervised temporal action localization
    Hui-fen Xia
    Yong-zhao Zhan
    Multimedia Tools and Applications, 2023, 82 : 29769 - 29787
  • [9] Deep cascaded action attention network for weakly-supervised temporal action localization
    Xia, Hui-fen
    Zhan, Yong-zhao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29769 - 29787
  • [10] ACGNet: Action Complement Graph Network for Weakly-Supervised Temporal Action Localization
    Yang, Zichen
    Qin, Jie
    Huang, Di
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3090 - 3098