Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network

被引:3
作者
Ren, Hao [1 ]
Ran, Wu [1 ]
Liu, Xingson [1 ]
Ren, Haoran [1 ]
Lu, Hong [1 ]
Zhang, Rui [1 ]
Jin, Cheng [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
Temporal Action Localization; Weakly-supervised Learning; Adaptive Clustering;
D O I
10.1109/ICME55011.2023.00177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-supervised temporal action localization task aims to localize temporal boundaries of action instances by using only video-level labels. Existing methods primarily adopt Multi-Instance-Learning (MIL) scheme to handle this task. The effectiveness of MIL scheme depends heavily on the selection of top-k action snippets, which is unstable and requires manual tuning. To address these deficiencies, we propose an Adaptive Clustering and Refining Network (ACRNet). Specifically, we present an action-aware clustering strategy that is adaptable and requires no manual tuning to separate action and background snippets of diverse videos based on intra-class activation distribution. And a cluster refining step is included to eliminate false action snippets by considering inter-class activation distribution, which greatly improves robustness and localization accuracy. Extensive experiments on THUMOS14, ActivityNet 1.2&1.3 benchmarks show that our method achieves state-of-the-art performance.
引用
收藏
页码:1008 / 1013
页数:6
相关论文
共 24 条
[1]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Dual-Evidential Learning for Weakly-supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Yang, Shicai ;
Xu, Changsheng .
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 :192-208
[4]   Dynamic Graph Modeling for Weakly-Supervised Temporal Action Localization [J].
Shi, Haichao ;
Zhang, Xiao-Yu ;
Li, Changsheng ;
Gong, Lixing ;
Li, Yong ;
Bao, Yongjun .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3820-3828
[5]   ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization [J].
He, Bo ;
Yang, Xitong ;
Kang, Le ;
Cheng, Zhiyu ;
Zhou, Xin ;
Shrivastava, Abhinav .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13915-13925
[6]   The THUMOS challenge on action recognition for videos "in the wild" [J].
Idrees, Haroon ;
Zamir, Amir R. ;
Jiang, Yu-Gang ;
Gorban, Alex ;
Laptev, Ivan ;
Sukthankar, Rahul ;
Shah, Mubarak .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 155 :1-23
[7]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001
[8]  
Lee P, 2021, AAAI CONF ARTIF INTE, V35, P1854
[9]  
Lee P, 2020, AAAI CONF ARTIF INTE, V34, P11320
[10]   BSN: Boundary Sensitive Network for Temporal Action Proposal Generation [J].
Lin, Tianwei ;
Zhao, Xu ;
Su, Haisheng ;
Wang, Chongjing ;
Yang, Ming .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :3-21