Spatio-Temporal Action Detector with Self-Attention

被引:5
作者
Ma, Xurui [1 ]
Luo, Zhigang [1 ,2 ]
Zhang, Xiang [1 ,3 ,4 ]
Liao, Qing [5 ]
Shen, Xingyu [1 ]
Wang, Mengzhu [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Lab, Changsha 410073, Hunan, Peoples R China
[3] Natl Univ Def Technol, Inst Quantum, Changsha 410073, Hunan, Peoples R China
[4] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Hunan, Peoples R China
[5] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen 518055, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
基金
中国国家自然科学基金;
关键词
Spatio-temporal action detection; self-attention; tubelets link algorithm;
D O I
10.1109/IJCNN52387.2021.9533300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of spatio-temporal action detection, some current studies attempt to solve the problem of action detection by using the one-stage object detectors based on anchor-free. Albeit efficiency, more performance boosts are expected. Towards this goal, a Self-Attention MovingCenter Detector (SAMOC) is proposed, which is blessed with two attractive aspects: 1) to effectively capture motion cues, a spatio-temporal self-attention block is explored to reinforce feature representation by aggregating motion-dependent global contexts, and 2) a link branch serves to model the frame-level object dependency, which promotes the confidence scores of correct actions. Experiments on two benchmark datasets show that SAMOC with the proposed two aspects achieves the state-of-the-art and works in real-time as well.
引用
收藏
页数:8
相关论文
共 33 条
[1]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[2]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[3]  
Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676
[4]   AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions [J].
Gu, Chunhui ;
Sun, Chen ;
Ross, David A. ;
Vondrick, Carl ;
Pantofaru, Caroline ;
Li, Yeqing ;
Vijayanarasimhan, Sudheendra ;
Toderici, George ;
Ricco, Susanna ;
Sukthankar, Rahul ;
Schmid, Cordelia ;
Malik, Jitendra .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6047-6056
[5]   Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos [J].
Hou, Rui ;
Chen, Chen ;
Shah, Mubarak .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5823-5832
[6]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[7]   Towards understanding action recognition [J].
Jhuang, Hueihan ;
Gall, Juergen ;
Zuffi, Silvia ;
Schmid, Cordelia ;
Black, Michael J. .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3192-3199
[8]   Action Tubelet Detector for Spatio-Temporal Action Localization [J].
Kalogeiton, Vicky ;
Weinzaepfel, Philippe ;
Ferrari, Vittorio ;
Schmid, Cordelia .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4415-4423
[9]   Interacting Tracklets for Multi-Object Tracking [J].
Lan, Long ;
Wang, Xinchao ;
Zhang, Shiliang ;
Tao, Dacheng ;
Gao, Wen ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) :4585-4597
[10]   Actions as Moving Points [J].
Li, Yixuan ;
Wang, Zixu ;
Wang, Limin ;
Wu, Gangshan .
COMPUTER VISION - ECCV 2020, PT XVI, 2020, 12361 :68-84