The accurate and efficient detection of steel surface defects remains challenging due to complex backgrounds, diverse defect types, and varying defect scales. The existing CNN-based methods often struggle with capturing long-range dependencies and handling complex background noise, resulting in suboptimal performance. Meanwhile, although Transformer-based approaches are effective in modeling global context, they typically require large-scale datasets and are computationally expensive, limiting their practicality for industrial applications. To address these challenges, we introduce a novel attention-based salient object detector, called the ASOD, to enhance the effectiveness of detectors for strip steel surface defects. In particular, we first design a novel channel-attention-based block including global max/average pooling to focus on the relevant channel-wise features while suppressing irrelevant channel responses, where maximizing pooling extracts the main features of local regions, while removing irrelevant features and average pooling obtain the overall features while removing local details. Then, a new block based on spatial attention is designed to emphasize the area with strip steel surface defects while suppressing irrelevant background areas. In addition, a new cross-spatial-attention-based block is designed to fuse the feature maps with multiple scales filtered through the proposed channel and spatial attention to produce features with better semantic and spatial information such that the detector adapts to strip steel defects of multiple sizes. The experiments show that the ASOD achieves superior performance across multiple evaluation metrics, with a weighted F-measure of 0.9559, an structure measure of 0.9230, a Pratt's figure of meri of 0.0113, and an mean absolute error of 0.0144. In addition, the ASOD demonstrates strong robustness to noise interference, maintaining consistently high performance even with 10-20% dataset noise, which confirms its stability and reliability.