A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [21] A novel framework for facial emotion recognition with noisy and de noisy techniques applied in data pre-processing
    Srinivas, P. V. V. S.
    Mishra, Pragnyaban
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022,
  • [22] Evaluating the impact of data pre-processing methods on classification of ATR-FTIR spectra of bituminous binders
    Khalighi, Sadaf
    Ma, Lili
    Ren, Shisong
    Varveri, Aikaterini
    FUEL, 2024, 376
  • [23] Application of various Pre-processing techniques on Infrared (IR) Spectroscopy data for classification of different ghee samples
    Kumar, Navjot
    Panchariya, P. C.
    Patel, Surendra Singh
    Kiranmayee, A. H.
    Ranjan, Rishi
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [24] Effect of Pre-processing of CT Images on the Performance of Deep Neural Networks Based Diagnosis of COVID-19
    Revelo Luna, David
    Eduardo Mejia, Julio
    Munoz Chaves, Javier
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2021, 80 (11): : 992 - 1000
  • [25] OPTIMIZING THE PRE-PROCESSING OF NEXT GENERATION SEQUENCING (NGS) READS
    Castillo, Jose Nelson Perez
    Parra, Nelson Enrique Vera
    Ramirez, Luis Miguel Gutierrez
    REDES DE INGENIERIA-ROMPIENDO LAS BARRERAS DEL CONOCIMIENTO, 2014, 5 (02): : 50 - 54
  • [26] Pre-processing Techniques for Colour Digital Pathology Image Analysis
    Saafin, Wael
    Schaefer, Gerald
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (MIUA 2017), 2017, 723 : 551 - 560
  • [27] AllInOne Pre-processing: A comprehensive preprocessing framework in plant field phenotyping
    Najafabadi, Mohsen Yoosefzadeh
    Heidari, Ali
    Rajcan, Istvan
    SOFTWAREX, 2023, 23
  • [28] Interval forecasting system for electricity load based on data pre-processing strategy and multi-objective optimization algorithm
    Wang, Jianzhou
    Zhang, Linyue
    Li, Zhiwu
    APPLIED ENERGY, 2022, 305
  • [29] Impact of sensor data pre-processing strategies and selection of machine learning algorithm on the prediction of metritis events in dairy cattle
    Vidal, Gema
    Sharpnack, James
    Pinedo, Pablo
    Tsai, I. Ching
    Lee, Amanda Renee
    Martinez-Lopez, Beatriz
    PREVENTIVE VETERINARY MEDICINE, 2023, 215
  • [30] Toward Reproducible Results from Targeted Metabolomic Studies: Perspectives for Data Pre-processing and a Basis for Analytic Pipeline Development
    Gross, Thomas
    Mapstone, Mark
    Miramontes, Ricardo
    Padilla, Robert
    Cheema, Amrita K.
    Macciardi, Fabio
    Federoff, Howard J.
    Fiandaca, Massimo S.
    CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2018, 18 (11) : 883 - 895