Video anomaly detection is the study of detecting low probability anomalies from high probability normal behaviour. The imbalance of data raises the need to effectively distinguish between normal and abnormal behaviour in foreground targets. In the current state of research, the focus of many researchers has been on predictions closer to acquisition, but this need to effectively distinguish between normal and abnormal has received little attention. To address this problem, we propose a spatio-temporal enhancement network, a prediction-based unsupervised learning approach to video anomaly detection. In order to capture similar properties between foreground targets, we design the correlation enhancement module, which aims to quantify the spatial information of normal foreground targets. The temporal information enhancement module is also designed to stabilise the temporal information of normal foreground targets to reduce the effect of background variation and improve the robustness of the network structure. To clarify the boundary between normal and abnormal targets, a poorer substitution is represents the abnormal target feature vector, a superior substitution is represents the normal target feature vector. In this way, to obtain a clear and accurate locus of the abnormal occurrence time. We have extensively test on the UCSD Ped1, UCSD Ped2, and Avenue datasets and obtained desirable average AUC metrics of 82.7, 97.6, and 89.6, respectively.