DCPM: an effective and robust approach for diabetes classification and prediction

被引:5
作者
kumari M. [1 ]
Ahlawat P. [1 ]
机构
[1] Department of CSE, Northcap University, Gurugram, Haryana
关键词
Diabetes; Feature selection; Machine learning; Missing values; Outliers; Pre-processing;
D O I
10.1007/s41870-021-00656-4
中图分类号
学科分类号
摘要
Diabetes is the most common medical disorders that occur due to the malfunctioning of the pancreas. It increases the level of sugar in the body and poses a severe concern to human health by adversely affecting almost all major organs of the body, including kidney, heart, eyes, etc. The number of research works in the literature proves that machine learning techniques can increase the early detection of disease and decrease medical error rates to save human life. Developing an accurate and effective diabetes prediction model is always a challenge, as the medical dataset suffers from outliers and missing values. The aim of this study is to build an accurate and robust Diabetes Classification and Prediction Model (DCPM) on a dataset that suffers from the class imbalance problem and contains outliers and missing values. The proposed work devises an effective pre-processing technique to remove outliers, fill missing values, standardize data and select relevant features for model learning in a pipelined manner. The proposed pre-processing techniques were applied on the Pima Indian Diabetes (PID) dataset obtained from the University of California at Irvine (UCI) Repository. The K-NN classifier is optimized to find the optimum value of k and is then trained and evaluated on the most predictive set of features of the pre-processed PID dataset. The performance of the proposed model is assessed using classification accuracy, precision, recall and F1-score. The proposed approach is able to attain statistically good classification accuracy, recall, precision and F1-score as 92.28%, 92.36%, 92.38% and 92.31%, respectively. The proposed model outperforms existing state-of-the-art approaches in terms of accuracy. Therefore, the proposed DCPM can assist the medical experts by providing a quick, precise and reliable recommendation that can be considered while making a crucial decision about the health of a patient in the healthcare sector. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:1079 / 1088
页数:9
相关论文
共 38 条
  • [1] Thirumal P.C., Nagarajan N., ‘Utilization of data mining techniques for diagnosis of diabetes mellitus—a case study’, ARPN J Eng Appl Sci, 10, 1, pp. 8-13, (2015)
  • [2] Diagnosis and classification of diabetes mellitus, Diabetes Care, 33, pp. S62-S69, (2010)
  • [3] Calvet H.M., Yoshikawa T.T., Infections in diabetes, Infect Dis Clin N Am, 15, pp. 407-421, (2001)
  • [4] Pozzilli P., Leslie R.D.G., Infections and diabetes: mechanisms and prospects for prevention, Diabet Med, 11, pp. 935-941, (1994)
  • [5] Saeedi P., Petersohn I., Salpea P., Malanda B., Karuranga S., Unwin N., Et al., Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, Diabetes Res Clin Pract, 157, (2019)
  • [6] Webb G.I., Boughton J.R., Wang Z., ‘Not so naive bayes: Aggregating one-dependence estimators’, Mach Learn, 58, 1, pp. 5-24, (2005)
  • [7] Jenhani I., Amor N.B., Elouedi Z., ‘Decision trees as possibilistic classifiers’, Int J Approx Reasoning, 48, 3, pp. 784-807, (2008)
  • [8] Breiman L., ‘Random forests’, Mach Learn, 45, 1, pp. 5-32, (2001)
  • [9] Kayaer K., Yildirim T., Medical diagnosis on Pima Indian diabetes using general regression neural networks, Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), 181, (2003)
  • [10] Kannadasan K., Edla D.R., Kuppili V., Type 2 diabetes data classification using stacked autoencoders in deep neural networks, Clin Epidemiol Global Health, 7, 4, pp. 530-535, (2019)