Apple yield estimation is a critical task in precision agriculture, challenged by complex tree canopy structures, growth stage variability, and orchard heterogeneity. In this study, we apply multi-source feature fusion by combining vegetation indices from UAV remote sensing imagery, structural feature ratios from ground-based fruit tree images, and leaf chlorophyll content (SPAD) to improve apple yield estimation accuracy. The DeepLabv3+ network, optimized with Convolutional Block Attention Module (CBAM) and Efficient Channel Attention (ECA), improved fruit tree image segmentation accuracy. Four structural feature ratios were extracted, visible-light and multispectral vegetation indices were calculated, and feature selection was performed using Pearson's correlation coefficient analysis. Yield estimation models were constructed using k-nearest neighbors (KNN), partial least squares (PLS), random forest (RF), and support vector machine (SVM) algorithms under both single feature sets and combined feature sets (including vegetation indices, structural feature ratios, SPAD, vegetation indices + SPAD, vegetation indices + structural feature ratios, structural feature ratios + SPAD, and the combination of all three). The optimized CBAM-ECA-DeepLabv3+ model achieved a mean Intersection over Union (mIoU) of 0.89, an 8% improvement over the baseline DeepLabv3+, and outperformed U2Net and PSPNet. The SVM model based on multi-source feature fusion achieved the highest apple yield estimation accuracy in small-scale orchard sample plots (R2 = 0.942, RMSE = 12.980 kg). This study establishes a reliable framework for precise fruit tree image segmentation and early yield estimation, advancing precision agriculture applications.