Human activity recognition (HAR) remains a challenging problem in computer vision due to the unpredictable nature of human activities. In recent years, researchers have proposed various hybrid models for HAR, focusing on spatial, temporal, or both aspects. The spatial and temporal feature-based models have limitations in fully capturing the features and exhibit low accuracy during the training and testing. Addressing these challenges, we present a TriFusion hybrid model that integrates spatial, temporal, and high-level features to enhance accuracy in HAR. In this paper, different from previous fusion methods, we propose a novel approach by combining three deep learning model architectures. Specifically, VGG16 spatial features are fed into BiGRU for temporal feature extraction and also directly connected to TriFusion, alongside the BiGRU output, while transfer learning (ResNet18) is also connected to TriFusion. Our TriFusion model achieved an average accuracy of 99.92% on the UCF101 dataset and 99.78% on the HMDB51 dataset, demonstrating its suitability for real-time deployment in HAR applications. Our TriFusion model, designed for HAR tasks, exhibits promising applications across various AI domains, including human-computer interaction and diverse classification tasks. The code of TriFusion is publicly accessible at https://github.com/TripleTheGreatDali/TriFusionHAR.