Spatial Enhancement and Temporal Constraint for Weakly Supervised Action Localization

被引：5

作者：

Qin, Xiaolei ^{[1
]}

Ge, Yongxin ^{[1
]}

Yu, Hui ^{[2
]}

Chen, Feiyu ^{[1
]}

Yang, Dan ^{[1
]}

机构：

[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China

[2] Univ Portsmouth, Portsmouth PO1 2DJ, Hants, England

来源：

IEEE SIGNAL PROCESSING LETTERS | 2020年 / 27卷

关键词：

Training; Proposals; Feature extraction; Two dimensional displays; Entropy; Signal processing; Signal processing algorithms; Weakly supervised temporal action localization; spatial enhancement; instance sparse constraint; confidence connectivity enhancement;

D O I：

10.1109/LSP.2020.3018914

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Weakly supervised temporal action localization (WSTAL) is a practical but challenging issue in video understanding. However, most existing methods have to activate background snippets or deactivate action snippets in cases of no boundary annotations, which inevitably affects the localization of action instances. In this letter, we propose a spatial enhancement and temporal constraint (SETC) model to address this problem from three aspects. Specifically, we first propose a spatial enhancement module to enhance the discrimination of the extracted features. Then we leverage the instance sparse constraint to restrain the drastic fluctuation class activation sequence (CAS). Finally, we use the confidence connectivity enhancement to connect the snippets that are broken up by mistake. Experiments on THUMOS'14 and ActivityNet datasets validate the efficacy of SETC against existing state-of-the-art WSTAL algorithms.

引用

页码：1520 / 1524

页数：5

共 19 条

[1] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
Alwassel, Humam
Heilbron, Fabian Caba
Ghanem, Bernard
[J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 253 - 269
[2] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Chao, Yu-Wei
Vijayanarasimhan, Sudheendra
Seybold, Bryan
Ross, David A.
Deng, Jia
Sukthankar, Rahul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
[5] Temporal Context Network for Activity Localization in Videos
Dai, Xiyang
Singh, Bharat
Zhang, Guyue
Davis, Larry S.
Chen, Yan Qiu
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5727 - 5736
[6] Gao J., 2017, P BRIT MACH VIS C BM
[7] What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization
Heilbron, Fabian Caba
Lee, Joon-Young
Jin, Hailin
Ghanem, Bernard
[J]. COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 212 - 229
[8] The THUMOS challenge on action recognition for videos "in the wild"
Idrees, Haroon
Zamir, Amir R.
Jiang, Yu-Gang
Gorban, Alex
Laptev, Ivan
Sukthankar, Rahul
Shah, Mubarak
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 155 : 1 - 23
[9] BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Lin, Tianwei
Zhao, Xu
Su, Haisheng
Wang, Chongjing
Yang, Ming
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 3 - 21
[10] Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization
Liu, Daochang
Jiang, Tingting
Wang, Yizhou
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1298 - 1307

← 1 2 →