TSCANet: a two-stream context aggregation network for weakly-supervised temporal action localization

被引:0
作者
Zhang, Haiping [1 ,2 ]
Lin, Haixiang [1 ]
Wang, Dongjing [1 ]
Xu, Dongyang [1 ]
Zhou, Fuxing [2 ]
Guan, Liming [2 ]
Yu, Dongjing [1 ]
Fang, Xujian [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 311305, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal action localization; Weakly supervised learning; Feature alignment; Mutual learning;
D O I
10.1007/s11227-024-06810-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Weakly supervised temporal action localization classifies and localizes actions in uncropped videos by using only video-level labels. Many current methods employ feature extractors initially intended for post-cropped video action classification. The accuracy of localization decreases when feature extractors of this type are used, because they may introduce redundant information into the action localization task. To overcome the aforementioned constraints, we propose a WSTAL technique based on the two-stream context aggregation network (TSCANet), which consists of two main modules: a multistage temporal feature aggregation module (MSTFA) and a feature alignment module (FA). The MSTFA enables TSCANet to rapidly expand the receptive field and acquire temporal dependencies between long-distance segments by stacking dilated convolutional layers. Therefore, MSTFA allows the model to better aggregate temporal information in optical flow features to reduce redundant information in the original features. To avoid inconsistencies between the enhanced optical flow and RGB flow features, this study designed an FA to calibrate RGB features using optimized optical flow features through a mutual learning approach. On THUMOS14 and ActivityNet datasets, many comparative tests are carried out, and an improved localization performance is attained. In particular, localization at low t-IoU thresholds outperforms many of the existing WSTAL methods.
引用
收藏
页数:23
相关论文
共 61 条
[1]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[2]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[3]  
Chen G, 2024, Arxiv, DOI [arXiv:2403.09626, DOI 10.48550/ARXIV.2403.09626]
[4]   Dual-Evidential Learning for Weakly-supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Yang, Shicai ;
Xu, Changsheng .
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 :192-208
[5]   U2D2Net: Unsupervised Unified Image Dehazing and Denoising Network for Single Hazy Image Enhancement [J].
Ding, Bosheng ;
Zhang, Ruiheng ;
Xu, Lixin ;
Liu, Guanyu ;
Yang, Shuo ;
Liu, Yumeng ;
Zhang, Qi .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :202-217
[6]   Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization [J].
Gao, Junyu ;
Chen, Mengyuan ;
Xu, Changsheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19967-19977
[7]   Micro-expression spotting with multi-scale local transformer in long videos [J].
Guo, Xupeng ;
Zhang, Xiaobiao ;
Li, Lei ;
Xia, Zhaoqiang .
PATTERN RECOGNITION LETTERS, 2023, 168 :146-152
[8]   ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization [J].
He, Bo ;
Yang, Xitong ;
Kang, Le ;
Cheng, Zhiyu ;
Zhou, Xin ;
Shrivastava, Abhinav .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13915-13925
[9]   Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization [J].
Hong, Fa-Ting ;
Feng, Jia-Chang ;
Xu, Dan ;
Shan, Ying ;
Zheng, Wei-Shi .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1591-1599
[10]   Learning Proposal-Aware Re-Ranking for Weakly-Supervised Temporal Action Localization [J].
Hu, Yufan ;
Fu, Jie ;
Chen, Mengyuan ;
Gao, Junyu ;
Dong, Jianfeng ;
Fan, Bin ;
Liu, Hongmin .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) :207-220