Objective UAVs provide advantages such as easy control, low cost, and good performance, and efficiently perform tasks in diverse sites and complex environments. UAV aerial image target detection is widely applied in practical scenarios, including urban transportation, military reconnaissance, and smart agriculture. This study proposes a small target detection algorithm for UAV aerial images using a ConvMixer detection head based on the improved YOLOv7-tiny to address the problems of missed detection and false detection caused by significant variations in target scale, densely distributed small-sized targets and complex backgrounds in UAV aerial images. Methods First, the activation function LeakyReLU is replaced with SiLU to compensate for the limited nonlinear expression of LeakyReLU and to enhance convergence speed and model generalization during training. Second, to strengthen the feature extraction capability for multi-scale targets and improve the detection of small targets, a small-target detection layer is designed, leading to a tiny-target detection head that increases the model receptive field and better addresses the scale variance problem caused by drastic target size changes. In addition, the ConvMixer layer is integrated into the prediction head; the depthwise and pointwise convolutions in ConvMixer capture the spatial and channel relationships in the feature information, improving the processing capability for small targets. Finally, the coupled detection head of YOLOv7-tiny is replaced with a more efficient decoupled head, which separates feature channels for localization and classification tasks and enhances both classification and localization accuracy. Regarding experiments, ablation experiments are designed from two directions to comprehensively verify the effectiveness of each improvement. Comparative experiments are also conducted to assess and analyze the detection performance of the improved algorithm against other algorithms. Results and Discussions This study mainly addresses the following aspects: 1) The network structure of the improved algorithm is proposed, and the principles and components of each improvement are introduced. Based on the YOLOv7-tiny network, the LeakyReLU activation function in the convolution block CBL is replaced by the SiLU activation function. A small target detection layer is introduced at the neck of the network, and a prediction head is incorporated. Several ConvMixer layers are also integrated into the end of the backbone network and the detection head. Finally, the efficient decoupled head structure is adopted for target prediction. All these enhancements to the baseline form the improved YOLOv7-tiny algorithm network structure. 2) Ablation experiments are designed to verify the effectiveness of each modification. This includes, firstly, adding a single improved module to the original YOLOv7-tiny algorithm to observe its impact and, secondly, removing individual modules from the final improved YOLOv7-tiny-SFCE model to evaluate their effect. Ten sets of ablation experiments are conducted under identical conditions. Results indicate that introducing the efficient decoupled head leads to the most significant accuracy improvement, increasing mAP by 1.1%. Removing the fourth small target detection head results in the most obvious performance degradation, reducing detection accuracy by 2.4%. 3) Comparative experiments are conducted to verify the comprehensive performance of the improved algorithm. More than ten recently proposed advanced algorithms are selected for comparison in terms of AP and mAP values across ten target categories. Results show that the proposed algorithm achieves the highest mAP value of 40.9% and performs best in detecting the categories pedestrian, people, car, and motor. Among these, the pedestrian, people, and motor categories show especially strong detection performance. 4) The detection performance of the improved algorithm is verified in real-world scenarios through comparative analysis. Detection results are demonstrated in various conditions, including sparse and dense distributions and day and night scenarios. Five images featuring dense targets, minimal targets, dark scenes, occluded targets, and complex backgrounds are randomly selected from the Visdrone2021 test challenge set to evaluate detection performance in UAV aerial images. Comparative visual detection results with the baseline YOLOv7-tiny show that the proposed algorithm significantly improves the identification of multi-scale small targets and reduces both missed and false detections. Conclusions This study mainly addresses and improves the issues of missed and false detections caused by large-scale variations, dense small target distributions, and complex backgrounds in UAV aerial images. Key contributions include enhancing the model’s feature extraction capabilities, providing more accurate localization and classification, and improving small target detection. However, limitations remain: 1) Some missed detections still occur for small targets with minimal pixel information and insufficient features to distinguish them from the background. 2) A balance between detection accuracy and real-time performance has not yet been achieved. The model’s parameter count and computational complexity require reduction. Future research will focus on further improving the detection of very small targets and optimizing the model for lightweight applications. © 2025 Sichuan University. All rights reserved.