Enhancing human activity recognition model performance through KMeans-based stratified data splitting

被引:0
作者
Tuyen, Nguyen Trung
Dang, Nguyen Thanh [1 ]
Minh, Duong Hon [2 ]
Nguyen, Thanh Q. [3 ]
机构
[1] HUTECH Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Nguyen Tat Thanh Univ, Fac Pharm, Ho Chi Minh City, Vietnam
[3] Nguyen Tat Thanh Univ, Inst Interdisciplinary Social Sci, 300A Nguyen Tat Thanh St,Ward 13,Dist 4, Ho Chi Minh 700000, Vietnam
关键词
Human activity recognition (HAR); data balancing; Kolmogorov-Smirnov test; KMeans clustering; stratified train-test split; feature selection; K-nearest neighbors (KNN); dimensionality reduction; PCA; LDA; machine learning;
D O I
10.1177/00202940241312873
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human Activity Recognition (HAR) plays a crucial role in healthcare, sports, security, and human-computer interaction. A major challenge in HAR is the imbalance and uneven distribution of features between the training and testing datasets, leading to biased machine learning models and reduced prediction performance. This study proposes a novel approach that combines clustering of KMeans with a stratified data splitting strategy. By utilizing clusters generated from KMeans, this method ensures that both the training and testing datasets contain representative features from all clusters, improving the model's reliability and generalizability. The Kolmogorov-Smirnov test is used to assess the uniformity of the feature distribution. Experimental results demonstrate that this method significantly enhances model accuracy and performance, achieving an accuracy of 98.58%, a Recall score of 98.66%, a Precision score of 98.65%, and an F1 score of 98.65%. These findings not only improve the effectiveness of current HAR models, but also open new research avenues for optimizing feature distribution in complex, multidimensional problems.
引用
收藏
页数:20
相关论文
共 28 条
  • [1] Anguita D., 2012, AMBIENT ASSISTED LIV, V4, P216, DOI DOI 10.1007/978-3-642-35395-6_30
  • [2] Anguita D., 2013, ESANN, P437
  • [3] Ankita Rani S., 2021, SENSORS-BASEL, V21, P3845, DOI DOI 10.3390/s21113845
  • [4] Bhuiyan Rasel Ahmed, 2020, Proceedings of the 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII 2020), P344, DOI 10.1109/ICKII50300.2020.9318786
  • [5] A Robust Feature Extraction Model for Human Activity Characterization Using 3-Axis Accelerometer and Gyroscope Data
    Bhuiyan, Rasel Ahmed
    Ahmed, Nadeem
    Amiruzzaman, Md
    Islam, Md Rashedul
    [J]. SENSORS, 2020, 20 (23) : 1 - 17
  • [6] A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data
    Challa, Sravan Kumar
    Kumar, Akhilesh
    Semwal, Vijay Bhaskar
    [J]. VISUAL COMPUTER, 2022, 38 (12) : 4095 - 4109
  • [7] Cinar A., 2022, Eng Sci, V17, P9, DOI [10.12739/NWSA.2022.17.2.1A0478, DOI 10.12739/NWSA.2022.17.2.1A0478]
  • [8] El Ghazi M., 2024, Informatica, V47, P15
  • [9] Hastie T., 2009, DATA MINING INFERENC, DOI [10.1007/978-0-387-84858-7, DOI 10.1007/978-0-387-84858-7]
  • [10] Human Daily and Sport Activity Recognition Using a Wearable inertial Sensor Network
    Hsu, Yu-Liang
    Yang, Shih-Chin
    Chang, Hsing-Cheng
    Lai, Hung-Che
    [J]. IEEE ACCESS, 2018, 6 : 31715 - 31728