Enhancing the reliability of wind turbines (WTs) is essential for reducing operational and maintenance costs in wind farms. However, the challenges of effectively extracting spatiotemporal features of fault signals in harsh environments, along with the limitations imposed by traditional diagnostics that rely solely on a single signal, inhibit improvements in diagnostic accuracy. To address these issues, we propose an end-to-end fault diagnosis method based on a multisource signal fusion, implemented through a convolutional neural network-bidirectional gated recurrent unit (CNN-BiGRU). Initially, the model embeds a lightweight convolutional block attention module (CBAM), which leverages CNN to capture spatial data features and BiGRU to process temporal features. This CBAM attention mechanism enhances the network's feature representation capabilities, enabling comprehensive end-to-end fault diagnosis. Furthermore, the Dezert-Smarandache theory (DSmT) is employed to integrate preliminary diagnostic results from acoustic, vibration, and supervisory control and data acquisition (SCADA) data, culminating in a robust gearbox fault diagnosis. Comparative case studies demonstrate that the proposed model effectively reduces diagnostic uncertainty compared to single-signal analysis approaches. In addition, the model achieves stable diagnostic accuracy in both steady and variable operating conditions, as well as in various noisy environments. The comparative case validates the effectiveness and feasibility of the proposed method for practical applications.