Performance Evaluation of a Proposed Machine Learning Model for Chronic Disease Datasets Using an Integrated Attribute Evaluator and an Improved Decision Tree Classifier

被引:35
作者
Mishra, Sushruta [1 ]
Mallick, Pradeep Kumar [1 ]
Tripathy, Hrudaya Kumar [1 ]
Bhoi, Akash Kumar [2 ]
Gonzalez-Briones, Alfonso [3 ,4 ,5 ]
机构
[1] Deemed Univ, Sch Comp Engn, Kalinga Inst Ind Technol, Bhubaneswar 751024, India
[2] Sikkim Manipal Univ, Sikkim Manipal Inst Technol, Dept Elect & Elect Engn, Majitar 737136, Sikkim, India
[3] Univ Complutense Madrid, Res Grp Agent Based Social & Interdisciplinary Ap, Madrid 28040, Spain
[4] Univ Salamanca, BISITE Res Grp, Calle Espejo S-N,Edificio Multiusos I D I, Salamanca 37007, Spain
[5] Air Inst, IoT Digital Innovat Hub, Calle Segunda 4, Salamanca 37188, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 22期
关键词
attribute selection; wrapper; filter; classification; regression; chronic diseases; diabetes; heart disease; breast cancer; decision tree; K-means clustering; FEATURE-SELECTION METHOD; CORONARY-HEART-DISEASE; MUTUAL INFORMATION; PREDICTION; RELEVANCE; NETWORK; HYBRID;
D O I
10.3390/app10228137
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
There is a consistent rise in chronic diseases worldwide. These diseases decrease immunity and the quality of daily life. The treatment of these disorders is a challenging task for medical professionals. Dimensionality reduction techniques make it possible to handle big data samples, providing decision support in relation to chronic diseases. These datasets contain a series of symptoms that are used in disease prediction. The presence of redundant and irrelevant symptoms in the datasets should be identified and removed using feature selection techniques to improve classification accuracy. Therefore, the main contribution of this paper is a comparative analysis of the impact of wrapper and filter selection methods on classification performance. The filter methods that have been considered include the Correlation Feature Selection (CFS) method, the Information Gain (IG) method and the Chi-Square (CS) method. The wrapper methods that have been considered include the Best First Search (BFS) method, the Linear Forward Selection (LFS) method and the Greedy Step Wise Search (GSS) method. A Decision Tree algorithm has been used as a classifier for this analysis and is implemented through the WEKA tool. An attribute significance analysis has been performed on the diabetes, breast cancer and heart disease datasets used in the study. It was observed that the CFS method outperformed other filter methods concerning the accuracy rate and execution time. The accuracy rate using the CFS method on the datasets for heart disease, diabetes, breast cancer was 93.8%, 89.5% and 96.8% respectively. Moreover, latency delays of 1.08 s, 1.02 s and 1.01 s were noted using the same method for the respective datasets. Among wrapper methods, BFS' performance was impressive in comparison to other methods. Maximum accuracy of 94.7%, 95.8% and 96.8% were achieved on the datasets for heart disease, diabetes and breast cancer respectively. Latency delays of 1.42 s, 1.44 s and 132 s were recorded using the same method for the respective datasets. On the basis of the obtained result, a new hybrid Attribute Evaluator method has been proposed which effectively integrates enhanced K-Means clustering with the CFS filter method and the BFS wrapper method. Furthermore, the hybrid method was evaluated with an improved decision tree classifier. The improved decision tree classifier combined clustering with classification. It was validated on 14 different chronic disease datasets and its performance was recorded. A very optimal and consistent classification performance was observed. The mean values for accuracy, specificity, sensitivity and f-score metrics were 96.7%, 96.5%, 95.6% and 96.2% respectively.
引用
收藏
页码:1 / 35
页数:35
相关论文
共 91 条
  • [1] Amin S, 2017, 2017 4TH IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ELECTRONICS (UPCON), P578, DOI 10.1109/UPCON.2017.8251114
  • [2] [Anonymous], 2016, IND J SCI TECHNOL
  • [3] Apoorva R., 2018, ALL TOP 5 CHRONIC DI
  • [4] Attia MW, 2015, INT J ADV COMPUT SC, V6, P53
  • [5] Awang R., 2007, P 9 INT C INF INT WE, P177
  • [6] USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING
    BATTITI, R
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04): : 537 - 550
  • [7] Benkaci M., 2010, Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA 2010), P790, DOI 10.1109/WAINA.2010.83
  • [8] Canlas R.D., 2009, THESIS, P1
  • [9] Chahal N, 2016, INT C COMP ELEC ENG, P8, DOI 10.1109/ICECE.2016.7853844
  • [10] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28