Water pollution has serious consequences for human health and water ecosystems, and accurate forecasting of water quality changes can aid in the early detection and treatment of such pollution. To better simulate the characteristics of water quality data, the development of a water quality prediction model must consider the non-smoothness of the data as well as the relationship between upstream and downstream water quality. Firstly, Optimal Variational Mode Decomposition (OVMD) technology is used to pre-process water quality monitoring data through linear decomposition into multiple components, with the components acting as a unit of prediction to reduce the difficulty of prediction. Secondly, to reflect the impact of upstream and downstream water quality relationships, comprehensive prediction modeling is carried out on the temporal and spatial correlation of water quality data between monitoring stations. Graph Attention Networks (GAT) are embedded in the Gated Recurrent Unit's (GRU) reset gate and update gate, and adaptive aggregation of data spatial characteristics of each site based on each station is used to achieve a multi-step prediction of water quality indicators. Finally, the predicted values of each component are then fused to form the final predicted values. Validation at 13 monitoring stations along the Mulan River demonstrated the effectiveness of the integrated prediction model.The results showed that, under different prediction steps, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) values of the GAT-GRU water quality prediction model were reduced by about 11-20% (ammonia nitrogen, NH3-N) and 8-18% (total phosphorus, TP) compared with the GCN (Graph Convolutional Networks)-GRU, and the Nash-Sutcliffe efficiency coefficient (NSE) value was increased by about 20% (NH3-N) and 13% (TP), respectively. The RMSE and MAE values of the OVMD-GAT-GRU water quality prediction model were further reduced by about 52-57% (NH3-N) and 25-35% (TP) compared with GAT-GRU, and the NSE value was increased by about 19-43% (NH3-N) and 15-33% (TP), respectively. As the prediction steps increased, the advantages of OVMD-GAT-GRU over GAT-GRU and GCN-GRU became more pronounced. These results demonstrate that OVMD-GAT-GRU exhibits superior prediction performance, robustness, accuracy, and adaptability to water quality indicators. The development of the OVMD-GAT-GRU water quality prediction model can provide significant assistance in the prevention and management of water pollution.