Unmanned Aerial Vehicles (UAVs) have recently emerged as a promising platform for acquiring high-resolution imagery in urban environments. Efficiently detecting cars from these images is vital for various applications, including traffic management, urban planning, and security. However, the abundance of features and the large variability in the appearance of cars within high-resolution UAV images pose significant challenges. This paper introduces a novel multi-task framework designed to enhance car detection by focusing specifically on road regions. The model utilizes a shared encoder and a fully convolutional network decoder, augmented by an attentive binary fusion for road and car segmentation. For car detection, we combine a deep layer aggregation with a CenterNet detection head. During training, UAV images are downsampled, passed through the encoder and both decoders, generating road/car confidence maps and car detection results. In the inference phase, a region extraction module is designed to extract high-resolution road segments according to the road segmentation mask. To enhance detection accuracy, the region extraction module concatenates the input image with the car confidence map. We also introduce a scale-weighted focal loss in response to challenges associated with detecting smaller cars in high-resolution UAV images. Experimental results on the UAVid2020 and VisDrone2020 datasets demonstrate the superiority of our model in both inference time and accuracy, meeting the real-time requirements of intelligent traffic systems.