Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

被引:8
作者
Zhou, Jianxiong [1 ]
Wu, Ying [1 ]
机构
[1] Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USA
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
D O I
10.1109/WACV56688.2023.00597
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-supervised Temporal Action Localization (WTAL) aims to classify and localize action instances in untrimmed videos with only video-level labels. Existing methods typically use snippet-level RGB and optical flow features extracted from pre-trained extractors directly. Because of two limitations: the short temporal span of snippets and the inappropriate initial features, these WTAL methods suffer from the lack of effective use of temporal information and have limited performance. In this paper, we propose the Temporal Feature Enhancement Dilated Convolution Network (TFE-DCN) to address these two limitations. The proposed TFE-DCN has an enlarged receptive field that covers a long temporal span to observe the full dynamics of action instances, which makes it powerful to capture temporal dependencies between snippets. Furthermore, we propose the Modality Enhancement Module that can enhance RGB features with the help of enhanced optical flow features, making the overall features appropriate for the WTAL task. Experiments conducted on THUMOS'14 and ActivityNet v1.3 datasets show that our proposed approach far outperforms state-of-the-art WTAL methods.
引用
收藏
页码:6017 / 6026
页数:10
相关论文
共 42 条
  • [1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [2] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
    Chao, Yu-Wei
    Vijayanarasimhan, Sudheendra
    Seybold, Bryan
    Ross, David A.
    Deng, Jia
    Sukthankar, Rahul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
  • [5] Temporal Context Network for Activity Localization in Videos
    Dai, Xiyang
    Singh, Bharat
    Zhang, Guyue
    Davis, Larry S.
    Chen, Yan Qiu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5727 - 5736
  • [6] Fan Ma, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12349), P420, DOI 10.1007/978-3-030-58548-8_25
  • [7] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
  • [8] ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
    He, Bo
    Yang, Xitong
    Kang, Le
    Cheng, Zhiyu
    Zhou, Xin
    Shrivastava, Abhinav
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13915 - 13925
  • [9] Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization
    Hong, Fa-Ting
    Feng, Jia-Chang
    Xu, Dan
    Shan, Ying
    Zheng, Wei-Shi
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1591 - 1599
  • [10] A Survey on Visual Content-Based Video Indexing and Retrieval
    Hu, Weiming
    Xie, Nianhua
    Li, Li
    Zeng, Xianglin
    Maybank, Stephen
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2011, 41 (06): : 797 - 819