Steel surface defect detection is one of the important applications of object detection technology in industry, which can accurately detect surface defects and improve the quality of products. To address the issues of low detection accuracy caused by less area, small scale and similarity between defects and background of steel surface defects. We proposes a RSTD-YOLOv7 method based on YOLOv7 for steel surface defect detection. First, the RFBVGG module and SimAM attention mechanism are integrated into the YOLOv7 backbone network to expand the receptive field, reduce the loss of texture information, and enhance the target feature extraction ability of the model. Second, the STRVGG module, constructed using the Swin Transformer, is incorporated into the neck network. This enhancement improves the extraction ability to capture deep information concealed within the feature maps, reduces feature loss, and improve the ability of feature detection. Then, an improved DSDH detector head is employed to elevate the model’s detection precision and network convergence speed. Finally, comparative experiments are conducted on the NEU-DET and GC10-DET datasets. The results show that our proposed method attains the highest detection accuracy, achieving an mAP of 79.3% and 73.2% respectively, compared with the original YOLOv7 model, the mAP increased by 15.9% and 9.6% respectively, the parameters were reduced by 11.3 M and 11.5 M, respectively, the FPS increased by 15.7% and 11.5%, respectively. These results show that our proposed model excels in detection accuracy and speed, exhibiting remarkable generalization capabilities.