Assessing impacts of data volume and data set balance in using deep learning approach to human activity recognition

被引:0
作者
Chen, Haipeng [1 ]
Xiong, Fuhai [1 ]
Wu, Dihong [1 ]
Zheng, Lingxiang [1 ]
Peng, Ao [1 ]
Hong, Xuemin [1 ]
Tang, Biyu [1 ]
Lu, Hai [1 ]
Shi, Haibin [1 ]
Zheng, Huiru [2 ]
机构
[1] Xiamen Univ, Sch Informat Sci & Engn, Xiamen, Peoples R China
[2] Ulster Univ, Sch Comp, Coleraine, Antrim, North Ireland
来源
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2017年
关键词
human activity recognition; deep learning; LSTM; CNN;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Over the past decade, deep learning developed rapidly and had significant impact on a variety of application domains. It has been applied to the field of human activity recognition to substitute for well-established analysis techniques that rely on handcrafted feature extraction and classification methods in recent years. However, less attentions have been paid to the influence of training data on recognition accuracy. In this paper, we assessed the influence factors of data volume and data balance in human activity recognition when using deep learning approaches. We evaluated the relationship between data volumes of training dataset and predict accuracy of deep learning algorithms. Given the impact of the data balance between activity categories on the recognition accuracy, we modified the SMOTE algorithm so that it can be applied to human activity recognition. Results show that when the data volume is small (< 4M), the recognition accuracy increased quickly with the increase of the quantity of training data. However, the growth trend of recognition accuracy slows down when the data quantity reaches 4 million. Further increase the data volume does not significantly improve the activity recognition performance. So we can conclude that 4 million data volume can ensure a sufficient accuracy for human activity recognition. Meanwhile, the data set balance operation can not only improve the recognition accuracy of minority categories, but also helps to increase the overall accuracy.
引用
收藏
页码:1160 / 1165
页数:6
相关论文
共 22 条
  • [11] Davis K, 2016, 2016 19TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), P371
  • [12] HealthAware: Tackling Obesity with Health Aware Smart Phone Systems
    Gao, Chunming
    Kong, Fanyu
    Tan, Jindong
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 1549 - +
  • [13] Ha S, 2016, IEEE IJCNN, P381, DOI 10.1109/IJCNN.2016.7727224
  • [14] Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    Han, H
    Wang, WY
    Mao, BH
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 878 - 887
  • [15] Han Xiao, 2016, INT C MECH MAT MAN E
  • [16] Jiangpeng Dai, 2010, 2010 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), P292, DOI 10.1109/PERCOMW.2010.5470652
  • [17] Kwapisz JR., 2011, ACM SIGKDD EXPLORATI, V12, P74
  • [18] Lara O. D., 2013, IEEE COMMUNICATIONS, V15
  • [19] LONG X, 2009, EMBC 2009 ANN INT C, P6107
  • [20] Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition
    Ordonez, Francisco Javier
    Roggen, Daniel
    [J]. SENSORS, 2016, 16 (01)