Spatio-Temporal Action Detector with Self-Attention

被引：5

作者：

Ma, Xurui ^{[1
]}

Luo, Zhigang ^{[1
,2
]}

Zhang, Xiang ^{[1
,3
,4
]}

Liao, Qing ^{[5
]}

Shen, Xingyu ^{[1
]}

Wang, Mengzhu ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

[2] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Lab, Changsha 410073, Hunan, Peoples R China

[3] Natl Univ Def Technol, Inst Quantum, Changsha 410073, Hunan, Peoples R China

[4] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Hunan, Peoples R China

[5] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen 518055, Peoples R China

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

基金：

中国国家自然科学基金;

关键词：

Spatio-temporal action detection; self-attention; tubelets link algorithm;

D O I：

10.1109/IJCNN52387.2021.9533300

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the field of spatio-temporal action detection, some current studies attempt to solve the problem of action detection by using the one-stage object detectors based on anchor-free. Albeit efficiency, more performance boosts are expected. Towards this goal, a Self-Attention MovingCenter Detector (SAMOC) is proposed, which is blessed with two attractive aspects: 1) to effectively capture motion cues, a spatio-temporal self-attention block is explored to reinforce feature representation by aggregating motion-dependent global contexts, and 2) a link branch serves to model the frame-level object dependency, which promotes the confidence scores of correct actions. Experiments on two benchmark datasets show that SAMOC with the proposed two aspects achieves the state-of-the-art and works in real-time as well.

引用

页数：8

共 33 条

[1] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].

Cao, Yue ;

Xu, Jiarui ;

Lin, Stephen ;

Wei, Fangyun ;

Hu, Han .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980

[2] Dual Attention Network for Scene Segmentation [J].

Fu, Jun ;

Liu, Jing ;

Tian, Haijie ;

Li, Yong ;

Bao, Yongjun ;

Fang, Zhiwei ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149

[3]

Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676

[4] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions [J].

Gu, Chunhui ;

Sun, Chen ;

Ross, David A. ;

Vondrick, Carl ;

Pantofaru, Caroline ;

Li, Yeqing ;

Vijayanarasimhan, Sudheendra ;

Toderici, George ;

Ricco, Susanna ;

Sukthankar, Rahul ;

Schmid, Cordelia ;

Malik, Jitendra .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6047-6056

[5] Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos [J].

Hou, Rui ;

Chen, Chen ;

Shah, Mubarak .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5823-5832

[6]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]

[7] Towards understanding action recognition [J].

Jhuang, Hueihan ;

Gall, Juergen ;

Zuffi, Silvia ;

Schmid, Cordelia ;

Black, Michael J. .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3192-3199

[8] Action Tubelet Detector for Spatio-Temporal Action Localization [J].

Kalogeiton, Vicky ;

Weinzaepfel, Philippe ;

Ferrari, Vittorio ;

Schmid, Cordelia .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4415-4423

[9] Interacting Tracklets for Multi-Object Tracking [J].

Lan, Long ;

Wang, Xinchao ;

Zhang, Shiliang ;

Tao, Dacheng ;

Gao, Wen ;

Huang, Thomas S. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) :4585-4597

[10] Actions as Moving Points [J].

Li, Yixuan ;

Wang, Zixu ;

Wang, Limin ;

Wu, Gangshan .

COMPUTER VISION - ECCV 2020, PT XVI, 2020, 12361 :68-84

← 1 2 3 4 →