We present a novel approach for multi-object detection in aerial videos based on tracking. The proposed method mainly involves four steps. Firstly, both the motion history image and the tracking trajectory are employed to extract candidate target regions. Secondly, the spatial-temporal saliency is used to detect moving objects in the candidate regions. Thirdly, the previous detected objects are tracked by mean shift in the current frame. And finally, the detection results are fused with the tracking results to get refined detection results, in turn the modified detection results are used to update the tracking models. The proposed algorithm is evaluated on VIVID aerial videos, and the results show that our approach can reliably detect moving objects even in challenging situations. Meanwhile, the proposed method can process videos in real time, without the effect of time delay.