Introduction Accurate prediction of joint torque is critical for preventing injury by providing precise insights into the forces acting on joints during activities. Traditional approaches, including inverse dynamics, EMG-driven neuromusculoskeletal (NMS) models, and standard machine learning methods, typically use surface EMG (sEMG) signals and kinematic data. However, these methods often struggle to reveal the complex, non-linear relationship between muscle activation and joint motion, particularly with complex or unfamiliar movements. The generalization of joint torque estimation models across different individuals faces a significant challenge, as feature transferability tends to decline in higher, task-specific layers, reducing model performance.Methods In this study, we proposed a CNN-GRU-Attention neural network model combining a neuromusculoskeletal (NMS) solver-informed (hybrid-CNN) augmented with transfer learning, designed to predict knee joint torque with higher accuracy. The neural network was trained using EMG signals, joint angles, and muscle forces as inputs to predict knee joint torque in different activities, and the predictive performance of the model was evaluated both within and between subjects. Additionally, we have developed a transfer learning method in the inter-subject model, which improved the accuracy of knee torque prediction by transferring the learning knowledge of previous participants to new participants.Results Our results showed that the hybrid-CNN model can predict knee joint torque within subjects with a significantly lower error (root mean square error <= 0.16 Nm/kg). A transfer learning technique was adopted in the inter-subject tests to significantly improve the generalizability with a lower error (root mean square error <= 0.14 Nm/kg).Conclusion The transfer learning-enhanced CNN-GRU-Attention with the NMS model shows great potential in the prediction of knee joint torque.