Remote sensing image target detection is of great significance in the fields of resource exploration, intelligent navigation, and environmental monitoring. However, due to the complexity of remote sensing image scenes, the variety of categories, and different scales, the existing target detection model cannot accurately perceive the shape and location of targets. In addition, the current network uses a complex computer system that hinders its usefulness in remote sensing target detection scenes. To overcome these limitations, we propose an efficient target detection model for remote sensing images based on shape-location aware enhancement in this article. First, we propose a lightweight attention-constrained Transformer structure combined with deformable operation as the feature extraction network, which focuses on adapting to the complex and variable shapes in the remote sensing scene and speeds up the convergence of the network. Second, we design Gram sampling to prevent the loss of location and shape information of remote sensing targets during the network sampling process. Finally, we propose the spatial reconstruction decoupled head. By introducing group normalization (GN) and gate control mechanisms to adjust the information flow of the input features, the network can accurately extract the shape and location information during the prediction process. Extensive experiments are conducted on three public remote sensing scene datasets. The positive results demonstrate that our model can achieve superior performance and outperform many state-of-the-art methods.