Recent advancements in deep learning and artificial intelligence have underscored the importance of human motion prediction in fields such as intelligent robotics, autonomous driving, and human-computer interaction. Current human motion prediction methods primarily focus on network structure and feature extraction innovations, often overlooking the underlying logic of spatio-temporal changes in motion data. This oversight can result in potential conflicts within the coupled modeling of spatial and temporal dependencies, potentially obscuring the spatio-temporal logic of human motion. In this paper, we address this issue by decoupling the spatio-temporal features, employing time series modeling for preliminary prediction, and introducing velocity data as a learning branch to capture joint dependencies. This velocity-based information more clearly represents quantitative indices related to human movement, enhancing the model's pattern recognition capability. We map the trajectory change rules to the joint change trends for future moments, thereby refining the prediction results. Additionally, we enhance local semantic information through a patching method and ensure the independence of multi-scale representations of spatial and temporal dimensions using a two-branch framework. We propose a multi-layer perceptron (MLP)-based network structure, DCMixer, designed to learn multi-scale dynamic information and perform internal feature extraction. Our approach achieves spatio-temporal fusion with greater kinematic logic, significantly improving model performance. We evaluated our method on three public datasets, demonstrating superior prediction performance compared to state-of-the-art methods. The code is publicly available at https://github.com/Dabanshou/STTSN.