In today's rapidly evolving internet landscape, prominent companies across various industries face increasingly complex business operations, leading to significant cluster-scale growth. However, this growth brings about challenges in cluster management and the inefficient utilization of vast amounts of data due to its low value density. This paper, based on the large-scale cluster virtualization and monitoring system of the data center of the Bureau of Geophysical Prospecting (BGP), utilizes time series data of host resources from the monitoring system's time series database to propose a multivariate multi-step time series forecasting model, MUL-CNN-BiGRU-Attention, for forecasting CPU load on virtual cluster hosts. The model undergoes extensive offline training using a large volume of time series data, followed by deployment using TensorFlow Serving. Recent small-batch data are employed for fine-tuning model parameters to better adapt to current data patterns. Comparative experiments are conducted between the proposed model and other baseline models, demonstrating notable improvements in Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and $R<^>{2}$ metrics by up to 35.2%, 56.1%, 32.5%, and 10.3%, respectively. Additionally, ablation experiments are designed to investigate the impact of different factors on the performance of the forecasting model, providing valuable insights for parameter optimization based on experimental results.