Considering the limitations of monocular 3D object detection in depth information and perception ability, we introduce a novel monocular 3D object detection algorithm, Dp-M3D, equipped with depth perception capabilities. To effectively model long-range feature dependencies during the fusion of depth maps and image features, we introduce a Transformer Feature Fusion Encoder (TFFEn). TFFEn integrates depth and image features, enabling more comprehensive long-range feature modeling. This enhances depth perception, ultimately improving the accuracy of 3D object detection. To enhance the detection ability of truncated objects at the edges of an image, we propose a Feature Enhancement method based on Deformable Convolution (FEDC). FEDC leverages depth confidence guidance to determine the deformation offset of the 3D bounding box, aligning features more effectively and improving depth perception. Furthermore, to address the issue of anchor box ranking, where candidate boxes with accurate depth predictions but low classification confidence are suppressed, we propose a Depth-perception Non-Maximum Suppression (Dp-NMS) algorithm. Dp-NMS refines the selection process by incorporating the product of classification confidence and depth confidence, ensuring that candidate boxes are ranked effectively and the most suitable detection box is retained. Experimental results on the challenging KITTI 3D object detection dataset demonstrate that the proposed method achieves AP3D scores of 23.41 %, 13.65 %, and 12.91 % in the easy, moderate, and hard categories, respectively. Our approach outperforms state-of-the-art monocular 3D object detection algorithms based on image and image-depth map fusion, with particularly significant improvements in detecting edge-truncated objects.