Aiming at the difficult problem of dynamic texture feature extraction in complex scenes, this paper proposes a dynamic texture modeling method based on multi-scale convolutional autoencoder. It merges information on multiple scales of convolutional neural networks and uses an attention mechanism to increase the weight of the main channel. Finally, the loss function is optimized by calculating the errors of multiple network levels. We have conducted experiments on the DynTex database. Compared with several other typical dynamic texture feature extraction methods, the dynamic texture reconstructed by this model has the best comprehensive effect. It solves the problems of blur, noise, and residual image in the reconstruction of dynamic texture features. At the same time, the effectiveness of the modeling method is verified.