Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus

被引:47
作者
Azad, Chandrashekhar [1 ]
Bhushan, Bharat [2 ]
Sharma, Rohit [3 ]
Shankar, Achyut [4 ]
Singh, Krishna Kant [5 ]
Khamparia, Aditya [6 ]
机构
[1] Natl Inst Technol, Dept Comp Applicat, Jamshedpur, Bihar, India
[2] Sharda Univ, Sch Engn & Technol, Dept Comp Sci & Engn, Greater Noida, India
[3] SRM Inst Sci & Technol, Fac Engn & Technol, Dept Elect & Commun Engn, NCR Campus,Delhi NCR Campus, Ghaziabad, India
[4] Amity Univ, Dept CSE, ASET, Noida 201301, India
[5] Jain, Dept Comp Sci & Engn, Bengaluru, India
[6] Lovely Profess Univ, Sch Comp Sci & Engn, Punjab, India
关键词
Decision tree; Genetic algorithm; SMOTE; Data classification; Healthcare; Machine learning; SUPPORT VECTOR MACHINES; FUZZY CLASSIFICATION; SYNTHETIC MINORITY; FEATURE-SELECTION; OPTIMIZATION; DIAGNOSIS; IMBALANCE; DESIGN;
D O I
10.1007/s00530-021-00817-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a well-known chronic disease that diminishes the insulin producing capability of the human body. This results in high blood sugar level which might lead to various complications such as eye damage, nerve damage, cardiovascular damage, kidney damage and stroke. Although diabetes has attracted huge research attention, the overall performance of such medical disease classification using machine learning techniques is relatively low, majorly due to existence of class imbalance and missing values in the data. In this paper, we propose a novel Prediction Model using Synthetic Minority Oversampling Technique, Genetic Algorithm and Decision Tree (PMSGD) for Classification of Diabetes Mellitus on Pima Indians Diabetes Database (PIDD) dataset. The framework of the proposed PMSGD prediction model is composed of four different layers. The first layer is the pre-processing layer which is responsible for handling missing values, detection of outlier and oversampling the minority class. In the second layer, the most significant features are selected using correlation and genetic algorithm. In the third layer, the proposed model is trained, and its effectiveness is evaluated in the fourth layer in terms of classification accuracy (CA), classification error (CE), precision, recall (sensitivity), measure (FM), and Area_Under_ROC (AUROC). The proposed PMSGD algorithm clearly outperforms its counterparts and achieves a remarkable accuracy of 82.1256%. The best outcome achieved by the proposed system in terms of CA, CE, precision, sensitivity, FM and AUROC is 82.1256%, 17.8744%, 0.8070%, 0.8598, 0.8326 and 0.8511, respectively. The obtained simulation results show the effectiveness and superiority of our proposed PMSGD model and their by reduced error rate to help in decision-making process.
引用
收藏
页码:1289 / 1307
页数:19
相关论文
共 61 条
[1]   GMDH-based feature ranking and selection for improved classification of medical data [J].
Abdel-Aal, RE .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) :456-468
[3]  
Alshamlan H., 2020, 2020 11 INT C INF CO, DOI 10.1109/icics49469.2020.239549
[4]  
Ameena RR., 2020, SYST SIMUL MODEL CLO, DOI 10.1016/b978-0-12-819779-0.00006-x
[5]   Diabetes complications in childhood and adolescent onset type 2 diabetes-a review [J].
Amutha, Anandakumar ;
Mohan, Viswanathan .
JOURNAL OF DIABETES AND ITS COMPLICATIONS, 2016, 30 (05) :951-957
[6]  
[Anonymous], 2020, Diabetes Care, V43, pS14, DOI DOI 10.2337/DC20-S002
[7]   Feature generation using genetic programming with comparative partner selection for diabetes classification [J].
Aslam, Muhammad Waqar ;
Zhu, Zhechen ;
Nandi, Asoke Kumar .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (13) :5402-5412
[8]   Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus [J].
Barakat, Nahla H. ;
Bradley, Andrew P. ;
Barakat, Mohamed Nabil H. .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2010, 14 (04) :1114-1120
[9]  
Changsheng Zhu, 2019, Informatics in Medicine Unlocked, V17, P19, DOI 10.1016/j.imu.2019.100179
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)