A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [41] Classification of Eye Movement and Its Application in Driving Based on a Refined Pre-Processing and Machine Learning Algorithm
    Li, Xian-Sheng
    Fan, Zhi-Zhen
    Ren, Yuan-Yuan
    Zheng, Xue-Lian
    Yang, Ran
    IEEE ACCESS, 2021, 9 : 136164 - 136181
  • [42] Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms
    Zou, Jinfeng
    Hong, Guini
    Guo, Xinwu
    Zhang, Lin
    Yao, Chen
    Wang, Jing
    Guo, Zheng
    PLOS ONE, 2011, 6 (10):
  • [43] Comparative Study of Illumination Pre-processing Techniques using Histogram Equalization and its Application in Face Recognition
    Dahake, R. P.
    Kharat, M. U.
    Gumaste, S., V
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (14): : 394 - 403
  • [44] Study on Network Intrusion Detection Method Using Discrete Pre-Processing Method and Convolution Neural Network
    Yoo, Jihoon
    Min, Byeongjun
    Kim, Sangsoo
    Shin, Dongil
    Shin, Dongkyoo
    IEEE ACCESS, 2021, 9 : 142348 - 142361
  • [45] A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis
    Nadimi-Shahraki, Mohammad H.
    Mohammadi, Saeed
    Zamani, Hoda
    Gandomi, Mostafa
    Gandomi, Amir H.
    ELECTRONICS, 2021, 10 (24)
  • [46] Isoelectric focusing array with immobilized pH gradient and dynamic scanning imaging for diabetes diagnosis
    Li, Guo-Qing
    Li, Hong-Gen
    Dong, Fang-Fang
    Bi, Yu-Fang
    Zhang, Qiang
    Kong, Fan-Zhi
    Liu, Xiao-Ping
    Saud, Shah
    Xiao, Hua
    Luo, Fang
    Peng, Ye
    Lu, Hao-Jie
    Fan, Liu-Yin
    Wang, Yu-Xing
    Cao, Cheng-Xi
    ANALYTICA CHIMICA ACTA, 2019, 1063 : 178 - 186
  • [47] An Optimal Decision Tree Model for Diabetes Diagnosis
    Sun, Zhen
    Yu, Songsen
    Zhang, Yang
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2019), 2019, : 83 - 87
  • [48] Hybrid Model for Recurrent Event Data
    Sousa-Ferreira, Ivo
    Abreu, Ana Maria
    MATRICES, STATISTICS AND BIG DATA, 2019, : 23 - 33
  • [49] A hybrid model for rule discovery in data
    Zhong, N
    Dong, JZ
    Liu, CN
    Ohsuga, S
    KNOWLEDGE-BASED SYSTEMS, 2001, 14 (07) : 397 - 412
  • [50] HistoClean: Open-source software for histological image pre-processing and augmentation to improve development of robust convolutional neural networks
    McCombe, Kris D.
    Craig, Stephanie G.
    Pulsawatdi, Amelie Viratham
    Quezada-Marin, Javier I.
    Hagan, Matthew
    Rajendran, Simon
    Humphries, Matthew P.
    Bingham, Victoria
    Salto-Tellez, Manuel
    Gault, Richard
    James, Jacqueline A.
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 (19): : 4840 - 4853