In the research field of steganography, advances in deep learning techniques have significantly improved the ability to embed secret messages into scene images. However, for document images with significant differences in color and background distributions, it is still a major challenge to ensure the invisibility of hidden information without interfering with the text-reading experience. To address this challenge, we propose an end-to-end framework designed specifically for document images, namely, the Dual Attention Multi-scale Encoder-Decoder Architecture (DAMS). The DAMS framework takes into full consideration of the pixel distributions and value deviations caused during the formation of document images. To balance the information embedding and extraction processes, the encoder and decoder adopt the same Channel Attention Network (CAN) module. In addition, we introduce a Self-Attention Fusion network (SAF), which can perform multi-scale text region feature extraction and fusion. The self-attention mechanism significantly enhances the perceptual capability of text region features, thereby improving the effectiveness of secret information embedding. Extensive experiments demonstrate that DAMS achieves state-of-the-art results, with an average accuracy rate of 99.99% and a PSNR of 40.52 dB under noise-free conditions, and an average accuracy rate of 99.32% and a PSNR of 38.24 dB under combined noise interference. The code will be released.