Multiscale spatial temporal attention graph convolution network for skeleton-based anomaly behavior detection

被引：15

作者：

Chen, Xiaoyu ^{[1
,2
]}

Kan, Shichao ^{[3
]}

Zhang, Fanghui ^{[1
,2
]}

Cen, Yigang ^{[1
,2
]}

Zhang, Linna ^{[4
]}

Zhang, Damin ^{[5
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100044, Peoples R China

[3] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China

[4] Guizhou Univ, Coll Mech Engn, Guiyang 550025, Guizhou, Peoples R China

[5] Guizhou Univ, Coll Big Data & Informat Engn, Guiyang 550025, Guizhou, Peoples R China

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2023年 / 90卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Multiscale spatial temporal graph; Spatial attention graph convolution; Skeleton-based anomaly behavior detection; NEURAL-NETWORKS;

D O I：

10.1016/j.jvcir.2022.103707

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Anomaly behavior detection plays a significant role in emergencies such as robbery. Although a lot of works have been proposed to deal with this problem, the performance in real applications is still relatively low. Here, to detect abnormal human behavior in videos, we propose a multiscale spatial temporal attention graph convolution network (MSTA-GCN) to capture and cluster the features of the human skeleton. First, based on the human skeleton graph, a multiscale spatial temporal attention graph convolution block (MSTA-GCB) is built which contains multiscale graphs in temporal and spatial dimensions. MSTA-GCB can simulate the motion relations of human body components at different scales where each scale corresponds to different granularity of annotation levels on the human skeleton. Then, static, globally-learned and attention-based adjacency matrices in the graph convolution module are proposed to capture hierarchical representation. Finally, extensive experiments are carried out on the ShanghaiTech Campus and CUHK Avenue datasets, the final results of the frame-level AUC/EER are 0.759/0.311 and 0.876/0.192, respectively. Moreover, the frame-level AUC is 0.768 for the human-related ShanghaiTech subset. These results show that our MSTA-GCN outperforms most of methods in video anomaly detection and we have obtained a new state-of-the-art performance in skeleton-based anomaly behavior detection.

引用

页数：9

共 49 条

[1] Latent Space Autoregression for Novelty Detection [J].

Abati, Davide ;

Porrello, Angelo ;

Calderara, Simone ;

Cucchiara, Rita .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :481-490

[2]

Antic B, 2011, IEEE I CONF COMP VIS, P2415, DOI 10.1109/ICCV.2011.6126525

[3]

Cao Z., 2017, P IEEE C COMP VIS PA, P7291

[4] Video anomaly detection with spatio-temporal dissociation [J].

Chang, Yunpeng ;

Tu, Zhigang ;

Xie, Wei ;

Luo, Bin ;

Zhang, Shifu ;

Sui, Haigang ;

Yuan, Junsong .

PATTERN RECOGNITION, 2022, 122

[5]

Chang YP, 2020, Img Proc Comp Vis Re, V12360, P329, DOI 10.1007/978-3-030-58555-6_20

[6] Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder [J].

Chong, Yong Shean ;

Tay, Yong Haur .

ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 :189-196

[7] Learning Dynamic Relationships for 3D Human Motion Prediction [J].

Cui, Qiongjie ;

Sun, Huaijiang ;

Yang, Fei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6518-6526

[8] Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate [J].

Doshi, Keval ;

Yilmaz, Yasin .

PATTERN RECOGNITION, 2021, 114

[9] RMPE: Regional Multi-Person Pose Estimation [J].

Fang, Hao-Shu ;

Xie, Shuqin ;

Tai, Yu-Wing ;

Lu, Cewu .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362

[10] Multi-Encoder Towards Effective Anomaly Detection in Videos [J].

Fang, Zhiwen ;

Zhou, Joey Tianyi ;

Xiao, Yang ;

Li, Yanan ;

Yang, Feng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :4106-4116

← 1 2 3 4 5 →