Feature Reconstruction With Disruption for Unsupervised Video Anomaly Detection

被引:5
作者
Tao, Chenchen [1 ]
Wang, Chong [1 ]
Lin, Sunqi [1 ]
Cai, Suhang [1 ]
Li, Di [1 ]
Qian, Jiangbo [1 ]
机构
[1] Ningbo Univ, Fac Elect Engn & Comp Sci, Ningbo 315211, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Benchmark testing; Transformers; Feature extraction; Robustness; Anomaly detection; Cross attention; feature reconstruction; transformer; unsupervised video anomaly detection; STRESS;
D O I
10.1109/TMM.2024.3405716
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised video anomaly detection (UVAD) has gained significant attention due to its label-free nature. Typically, UVAD methods can be categorized into two branches, i.e. the one-class classification (OCC) methods and fully UVAD ones. However, the former may suffer from data imbalance and high false alarm rates, while the latter relies heavily on feature representation and pseudo-labels. In this paper, a novel feature reconstruction and disruption model (FRD-UVAD) is proposed for effective feature refinement and better pseudo-label generation in fully UVAD, based on cascade cross-attention transformers, a latent anomaly memory bank and an auxiliary scorer. The clip features are reconstructed using the space-time intra-clip information, as well as cross-inter-clip knowledge. Moreover, instead of blindly reconstructing all training features as OCC methods, a new disruption process is proposed to cooperate with the feature reconstruction simultaneously. Using the collected pseudo anomaly samples, it is able to emphasize the feature differences between normal and abnormal events. Additionally, a pre-trained UVAD scorer is utilized as a different criteria for anomaly prediction, which further refines the pseudo-labels. To demonstrate its effectiveness, comprehensive experiments and detailed ablation studies are conducted on three video benchmarks, namely CUHK Avenue, ShanghaiTech and UCF-Crime. Our proposed model (FRD-UVAD) achieves the best AUC performance (91.23%, 80.14%, and 82.12%) on all three datasets, surpassing other state-of-the-art OCC and fully UVAD methods. Furthermore, it obtains the lowest false alarm rate with a lower scene dependency, compared with other OCC methods.
引用
收藏
页码:10160 / 10173
页数:14
相关论文
共 66 条
[1]  
Al-lahham A., 2024, P IEEE CVF WINT C AP, P6793
[2]   MEAN SQUARE ERROR OF PREDICTION AS A CRITERION FOR SELECTING VARIABLES [J].
ALLEN, DM .
TECHNOMETRICS, 1971, 13 (03) :469-&
[3]  
[Anonymous], 2020, Int. J. Adv. Trends Comput. Sci. Eng., V9, DOI 10.30534/ijatcse/2020/175942020
[4]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[5]   Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection [J].
Astrid, Marcella ;
Zaheer, Muhammad Zaigham ;
Lee, Seung-Ik .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :207-214
[6]  
Bulat A, 2021, ADV NEUR IN
[7]  
Cai RC, 2021, AAAI CONF ARTIF INTE, V35, P938
[8]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[9]   NM-GAN: Noise-modulated generative adversarial network for video anomaly detection [J].
Chen, Dongyue ;
Yue, Lingyi ;
Chang, Xingya ;
Xu, Ming ;
Jia, Tong .
PATTERN RECOGNITION, 2021, 116
[10]   Unsupervised video anomaly detection via normalizing flows with implicit latent features [J].
Cho, MyeongAh ;
Kim, Taeoh ;
Kim, Woo Jin ;
Cho, Suhwan ;
Lee, Sangyoun .
PATTERN RECOGNITION, 2022, 129