The aerial objects tend to distribute with a major variation in the scale and arbitrary orientations in remote sensing images. To meet such characteristics of the aerial object, most of the existing anchor-based detectors rely on preset anchors with variable scales, angles, and aspect ratios, which lead to the misalignment of the selection of candidate regions, the extraction of object features, and the label assignment of preset boxes, interfering the performance of the detector. To address this issue, we propose a hierarchical adaptive alignment network (HAA-Net). Specifically, we first design the region refinement module (RRM), feature alignment module (FAM), and potential label assignment module (PLAM) to alleviate the misalignment of the region, feature, and label levels, respectively; furthermore, we use the gradient equalization strategy to jointly optimize these modules at different levels, so that the whole network can be fully trained to significantly improve detection performance. Extensive experiments demonstrate that our approach can achieve superior performance in three common aerial object datasets [(e.g., Dataset of Object Detection in Aerial Images (DOTA), High-Resolution Ship Collection 2016 (HRSC2016), and University of Chinese Academy of Sciences-Aerial Object Detection (UCAS-AOD)] when compared with state-of-the-art detectors.