Action Shuffling for Weakly Supervised Temporal Localization

被引:6
作者
Zhang, Xiao-Yu [1 ]
Shi, Haichao [1 ,2 ]
Li, Changsheng [3 ]
Shi, Xinchu [4 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 101408, Peoples R China
[3] Beijing Inst Technol, Beijing 100081, Peoples R China
[4] Meituan Grp, Beijing 100102, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Location awareness; Training; Task analysis; Annotations; Semantics; Network architecture; Temporal action localization; self-supervised; inter-action; intra-action; ACTION RECOGNITION;
D O I
10.1109/TIP.2022.3185485
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.
引用
收藏
页码:4447 / 4457
页数:11
相关论文
共 50 条
[1]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[4]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[5]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[6]   Two Stream LSTM : A Deep Fusion Framework for Human Action Recognition [J].
Gammulle, Harshala ;
Denman, Simon ;
Sridharan, Sridha ;
Fookes, Clinton .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :177-186
[7]   You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images [J].
Gan, Chuang ;
Yao, Ting ;
Yang, Kuiyuan ;
Yang, Yi ;
Mei, Tao .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :923-932
[8]   Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames [J].
Gan, Chuang ;
Sun, Chen ;
Duan, Lixin ;
Gong, Boqing .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :849-866
[9]  
Gan C, 2015, PROC CVPR IEEE, P2568, DOI 10.1109/CVPR.2015.7298872
[10]  
Goodfellow I. J., 2015, ICLR