Wind energy is a clean energy source that can effectively reduce pollution. Efficient and accurate wind power forecasting helps optimize the operation and scheduling of wind farms, thereby enhancing energy utilization. The intensity of wind energy is closely related to weather information, and these data are in the form of time series. Currently, most time series forecasting models use multi-head self-attention mechanisms and gated recurrent units. However, the self-attention mechanism has disorderly characteristics, which may disrupt the data sequence and lead to the loss of temporal information; meanwhile, gated recurrent units have limited effectiveness in capturing information among multiple variables, making it challenging to significantly improve prediction accuracy. To address these shortcomings, this study innovatively introduces a cross-attention mechanism into the core prediction algorithm and proposes a new forecasting module, the CA module, which can capture information among variables while maintaining the orderliness of the time series. Furthermore, to address the time lag between weather information and power data, this study designs a novel data alignment algorithm based on increments and combines it with variational mode decomposition to achieve multi-source feature alignment, further improving prediction accuracy. Additionally, a reinforcement learning algorithm is used as a hyperparameter optimization framework during the forecasting process to determine the optimal hyperparameters for different tasks, thus enhancing the model's accuracy and practicality. The experiments are based on weather and corresponding power data from four wind power stations in eastern China during spring, including ablation, comparison, and generalization experiments. The results show that the proposed model improves wind power output forecasting accuracy by more than 15% across different time scales compared to current mainstream algorithms, with short- to medium-term prediction accuracy improvements of up to 58.1%.