Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

被引：8

作者：

Zhou, Jianxiong ^{[1
]}

Wu, Ying ^{[1
]}

机构：

[1] Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USA

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00597

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly-supervised Temporal Action Localization (WTAL) aims to classify and localize action instances in untrimmed videos with only video-level labels. Existing methods typically use snippet-level RGB and optical flow features extracted from pre-trained extractors directly. Because of two limitations: the short temporal span of snippets and the inappropriate initial features, these WTAL methods suffer from the lack of effective use of temporal information and have limited performance. In this paper, we propose the Temporal Feature Enhancement Dilated Convolution Network (TFE-DCN) to address these two limitations. The proposed TFE-DCN has an enlarged receptive field that covers a long temporal span to observe the full dynamics of action instances, which makes it powerful to capture temporal dependencies between snippets. Furthermore, we propose the Modality Enhancement Module that can enhance RGB features with the help of enhanced optical flow features, making the overall features appropriate for the WTAL task. Experiments conducted on THUMOS'14 and ActivityNet v1.3 datasets show that our proposed approach far outperforms state-of-the-art WTAL methods.

引用

页码：6017 / 6026

页数：10

共 42 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Abu Farha, Yazan
Gall, Juergen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
[2] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Chao, Yu-Wei
Vijayanarasimhan, Sudheendra
Seybold, Bryan
Ross, David A.
Deng, Jia
Sukthankar, Rahul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
[5] Temporal Context Network for Activity Localization in Videos
Dai, Xiyang
Singh, Bharat
Zhang, Guyue
Davis, Larry S.
Chen, Yan Qiu
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5727 - 5736
[6] Fan Ma, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12349), P420, DOI 10.1007/978-3-030-58548-8_25
[7] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
Gao, Junyu
Chen, Mengyuan
Xu, Changsheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
[8] ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
He, Bo
Yang, Xitong
Kang, Le
Cheng, Zhiyu
Zhou, Xin
Shrivastava, Abhinav
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13915 - 13925
[9] Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization
Hong, Fa-Ting
Feng, Jia-Chang
Xu, Dan
Shan, Ying
Zheng, Wei-Shi
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1591 - 1599
[10] A Survey on Visual Content-Based Video Indexing and Retrieval
Hu, Weiming
Xie, Nianhua
Li, Li
Zeng, Xianglin
Maybank, Stephen
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2011, 41 (06): : 797 - 819

← 1 2 3 4 5 →