In cross-media information communication, it is essential to embed watermarks imperceptibly while also robustly resisting screen- shooting attacks. However, existing robust watermarking methods often struggle to achieve both objectives simultaneously. Therefore, this paper proposes a novel end-to-end screen-shooting resistant image watermarking method based on dense blocks and the convolutional block attention module (CBAM) attention mechanism. In the watermark embedding phase, an encoder that integrates dense connections and CBAM is employed. This approach effectively extracts features from the cover image, enhancing the visual quality of watermarked images while ensuring a certain level of robustness. The noise layer simulated by differentiable function not only contains moiré patterns, illumination, and perspective distortions—factors that significantly impact the screen-shooting process—but also encompasses Gaussian noise, which is commonly present. During the watermark extraction phase, a gradient mask is utilized to guide the encoder in generating watermarked images that facilitate more effective decoding, thereby enabling accurate extraction of the watermark. Ultimately, the robustness is improved by the encoder, the introduced noise layer, and the decoder through joint training. Experimental results demonstrate that the proposed method not only achieves excellent visual quality, with a PSNR value of 36.04 dB for the watermarked images, but also maintains a watermark extraction rate exceeding 95% under various shooting conditions (including different distances, angles, and devices). Notably, the extraction rate reaches 100% at shooting distances of 20 cm and 30 cm, showcasing strong robustness.