Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network

被引：3

作者：

Ren, Hao ^{[1
]}

Ran, Wu ^{[1
]}

Liu, Xingson ^{[1
]}

Ren, Haoran ^{[1
]}

Lu, Hong ^{[1
]}

Zhang, Rui ^{[1
]}

Jin, Cheng ^{[1
,2
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

Temporal Action Localization; Weakly-supervised Learning; Adaptive Clustering;

D O I：

10.1109/ICME55011.2023.00177

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly-supervised temporal action localization task aims to localize temporal boundaries of action instances by using only video-level labels. Existing methods primarily adopt Multi-Instance-Learning (MIL) scheme to handle this task. The effectiveness of MIL scheme depends heavily on the selection of top-k action snippets, which is unstable and requires manual tuning. To address these deficiencies, we propose an Adaptive Clustering and Refining Network (ACRNet). Specifically, we present an action-aware clustering strategy that is adaptable and requires no manual tuning to separate action and background snippets of diverse videos based on intra-class activation distribution. And a cluster refining step is included to eliminate false action snippets by considering inter-class activation distribution, which greatly improves robustness and localization accuracy. Extensive experiments on THUMOS14, ActivityNet 1.2&1.3 benchmarks show that our method achieves state-of-the-art performance.

引用

页码：1008 / 1013

页数：6

共 24 条

[1]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization [J].

Chen, Mengyuan ;

Gao, Junyu ;

Yang, Shicai ;

Xu, Changsheng .

COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 :192-208

[4] Dynamic Graph Modeling for Weakly-Supervised Temporal Action Localization [J].

Shi, Haichao ;

Zhang, Xiao-Yu ;

Li, Changsheng ;

Gong, Lixing ;

Li, Yong ;

Bao, Yongjun .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3820-3828

[5] ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization [J].

He, Bo ;

Yang, Xitong ;

Kang, Le ;

Cheng, Zhiyu ;

Zhou, Xin ;

Shrivastava, Abhinav .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13915-13925

[6] The THUMOS challenge on action recognition for videos "in the wild" [J].

Idrees, Haroon ;

Zamir, Amir R. ;

Jiang, Yu-Gang ;

Gorban, Alex ;

Laptev, Ivan ;

Sukthankar, Rahul ;

Shah, Mubarak .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 155 :1-23

[7]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

[8]

Lee P, 2021, AAAI CONF ARTIF INTE, V35, P1854

[9]

Lee P, 2020, AAAI CONF ARTIF INTE, V34, P11320

[10] BSN: Boundary Sensitive Network for Temporal Action Proposal Generation [J].

Lin, Tianwei ;

Zhao, Xu ;

Su, Haisheng ;

Wang, Chongjing ;

Yang, Ming .

COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :3-21

← 1 2 3 →