Temporal RPN Learning for Weakly-Supervised Temporal Action Localization

被引:0
作者
Huang, Jing [1 ]
Kong, Ming [2 ,3 ]
Chen, Luyuan [4 ]
Liang, Tian [1 ]
Zhu, Qiang [2 ]
机构
[1] Zhejiang Univ, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310058, Peoples R China
[3] Hikvis Res Inst, Hangzhou 310051, Peoples R China
[4] Beijing Informat Sci & Technol Univ, Beijing 100101, Peoples R China
来源
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222 | 2023年 / 222卷
关键词
Weakly-Supervised Learning; Action Localization; Temporal Region Proposal;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-Supervised Temporal Action Localization (WSTAL) aims to train an action instance localization model from untrimmed videos with only video-level labels, similar to the Object Detection (OD) task. Existing Top-k MIL-based WSTAL methods cannot flexibly define the learning space, which limits the model's learning efficiency and performance. Faster R-CNN is a classic two-stage object detection architecture with an efficient Region Proposal Network. This paper successfully migrates the Faster R-CNN liked two-stage architecture to the WSTAL task: first to build a T-RPN and integrate it with the traditional WSTAL framework; and then to propose a pseudo label generation mechanism to enable the T-RPN learning without temporal annotations. Our new framework has achieved breakthrough performances on THUMOS-14 and ActivityNet-v1.2 datasets, and comprehensive ablation experiments have verified the effectiveness of the innovations. Code will be available at: https://github.com/ZJUHJ/TRPN.
引用
收藏
页数:16
相关论文
共 38 条
[1]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[4]   Dual-Evidential Learning for Weakly-supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Yang, Shicai ;
Xu, Changsheng .
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 :192-208
[5]   Attention-based Dropout Layer for Weakly Supervised Object Localization [J].
Choe, Junsuk ;
Shim, Hyunjung .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2214-2223
[6]   MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [J].
Feng, Jia-Chang ;
Hong, Fa-Ting ;
Zheng, Wei-Shi .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14004-14013
[7]   Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization [J].
Gao, Junyu ;
Chen, Mengyuan ;
Xu, Changsheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19967-19977
[8]   ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization [J].
He, Bo ;
Yang, Xitong ;
Kang, Le ;
Cheng, Zhiyu ;
Zhou, Xin ;
Shrivastava, Abhinav .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13915-13925
[9]   Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) :1904-1916
[10]   Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization [J].
Hong, Fa-Ting ;
Feng, Jia-Chang ;
Xu, Dan ;
Shan, Ying ;
Zheng, Wei-Shi .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1591-1599