Human activity recognition (HAR) is an important research area in both machine learning and human-computer interactions. Unfortunately, it remains an extremely difficult process owing to unsolvable issues, such as sensor movement, sensor positioning, crowded background, and inherent diversity in task performances by distinct humans. In this study, we developed an ensemble of classification models for the HAR. The proposed HAR has four working phases-preprocessing, segmentation, feature extraction, and classification. The pre-processing phase includes processes such as frame conversion and contrast enhancement. We developed an improved balanced iterative reducing and clustering utilising hierarchies (BIRCH) algorithm, that provides efficient segmentation by utilizing only minimal resources. These segmented images are subjected to feature extraction, in which grey level co-occurrence matrix (GLCM) features, and improved local gradient threshold pattern (LGTP) features are extracted along with conventional bag of visual words (BoVW) to provide better results. An ensemble classification model with classifiers such as Bi-GRU, CNN, and LSTM was developed in this study to provide an accurate classification. To enhance the performance of the proposed model, we developed a blue monkey standardized aquila optimization (BMSAO) approach. Conventional techniques are contrasted with the proposed framework. The proposed mechanism was found to have higher efficiency in HAR after it was experimentally evaluated.