MODAL CONSENSUS AND CONTEXTUAL SEPARATION FOR WEAKLY SUPERVISED TEMPORAL ACTION LOCALIZATION

被引：0

作者：

Liu, Peng ^{[1
]}

Wang, Chuanxu ^{[1
]}

Zhao, Min ^{[1
]}

机构：

[1] Qingdao Univ Sci & Technol, Qingdao 266061, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

Weakly supervised learning; Temporal action localization; Cross-modal collaboration; Spatiotemporal self-attention; Hybrid modeling mechanism;

D O I：

10.1109/ICASSP48485.2024.10446233

中图分类号：

学科分类号：

摘要：

Weakly-supervised Temporal Action Localization (W-TAL) is a challenging task aiming to achieve both action class identification and localization of temporal boundaries using video-level label learning. Recent methods resort to basic cascading or integration of appearance and optical flow features, often resulting in incomplete action localization and ambiguity distinguishing foreground from background. Therefore, this paper introduces the Modal Consensus and Context Separation (MCCS) approach to address these complexities. First, the modal collaboration module proposes to enhance action feature representation by synergizing appearance and optical flow features while discarding redundant elements to eschew suboptimal outcomes. Further, these augmented bimodal streams are meticulously fused via the spatiotemporal self-attention module, which adeptly fuses spatial and temporal relationships of action snippets. In addition, the hybrid modeling mechanism is employed for foreground-background separation, focusing on local action attributes within hybrid features to refine the differentiation between foreground and background. This paper substantiates the efficacy of the MCCS method through rigorous testing on the THUMOS14 and ActivityNet1.3 datasets, demonstrating its superiority in tackling the intricate facets of W-TAL.

引用

页码：4220 / 4224

页数：5

共 21 条

[1] Trajectory-Based Surveillance Analysis: A Survey
Ahmed, Sk Arif
Dogra, Debi Prosad
Kar, Samarjit
Roy, Partha Pratim
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (07) : 1985 - 1997
[2] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Chao, Yu-Wei
Vijayanarasimhan, Sudheendra
Seybold, Bryan
Ross, David A.
Deng, Jia
Sukthankar, Rahul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
[5] Chen Zhengyan, 2022, IEEE T MULTIMEDIA
[6] Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization
Hong, Fa-Ting
Feng, Jia-Chang
Xu, Dan
Shan, Ying
Zheng, Wei-Shi
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1591 - 1599
[7] Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
Huang, Linjiang
Wang, Liang
Li, Hongsheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3262 - 3271
[8] A comprehensive survey of multi-view video summarization
Hussain, Tanveer
Muhammad, Khan
Ding, Weiping
Lloret, Jaime
Baik, Sung Wook
de Albuquerque, Victor Hugo C.
[J]. PATTERN RECOGNITION, 2021, 109
[9] The THUMOS challenge on action recognition for videos "in the wild"
Idrees, Haroon
Zamir, Amir R.
Jiang, Yu-Gang
Gorban, Alex
Laptev, Ivan
Sukthankar, Rahul
Shah, Mubarak
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 155 : 1 - 23
[10] Kingma D.P., 2014, arXiv, DOI [DOI 10.48550/ARXIV.1412.6980, 10.48550/arXiv.1412.6980]

← 1 2 3 →