A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [1] Data pre-processing for analyzing microbiome data - A mini review
    Zhou, Ruwen
    Ng, Siu Kin
    Sung, Joseph Jao Yiu
    Goh, Wilson Wen Bin
    Wong, Sunny Hei
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 4804 - 4815
  • [2] Data Preparation for Pre-processing on Oral Cancer Dataset
    Mohd, Fatihah
    Abu Bakar, Zainab
    Noor, Noor Maizura Mohamad
    Rajion, Zainul Ahmad
    2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 324 - 328
  • [3] Data Pre-Processing for More Effective Gene Clustering
    Hou, Jingyu
    Chen, Yi-Ping Phoebe
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 710 - 713
  • [4] protGear: A protein microarray data pre-processing suite
    Mwai, Kennedy
    Kibinge, Nelson
    Tuju, James
    Kamuyu, Gathoni
    Kimathi, Rinter
    Mburu, James
    Chepsat, Emily
    Nyamako, Lydia
    Chege, Timothy
    Nkumama, Irene
    Kinyanjui, Samson
    Musenge, Eustasius
    Osier, Faith
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 2518 - 2525
  • [5] Pre-Processing of Affymetrix Gene Chip Microarray Data
    Hasan, Ahmed R.
    Pattison, John E.
    Hariz, Alex
    CURRENT BIOINFORMATICS, 2010, 5 (04) : 270 - 279
  • [6] PepsNMR for 1H NMR metabolomic data pre-processing
    Martin, Manon
    Legat, Benoit
    Leenders, Justine
    Vanwinsberghe, Julien
    Rousseau, Rejane
    Boulanger, Bruno
    Eilers, Paul H. C.
    De Tullio, Pascal
    Govaerts, Bernadette
    ANALYTICA CHIMICA ACTA, 2018, 1019 : 1 - 13
  • [7] Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection
    Fast, Andrew
    Friedland, Lisa
    Maier, Marc
    Taylor, Brian
    Jensen, David
    Goldberg, Henry G.
    Komoroske, John
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 941 - +
  • [8] A new class of MODWT-SVM-DE hybrid model emphasizing on simplification structure in data pre-processing: A case study of annual electricity consumptions
    Sujjaviriyasup, Thoranin
    APPLIED SOFT COMPUTING, 2017, 54 : 150 - 163
  • [9] Common components and specific weights analysis: A tool for metabolomic data pre-processing
    Dubin, Elodie
    Spiteri, Marc
    Dumas, Anne-Sophie
    Ginet, Jerome
    Lees, Michele
    Rutledge, Douglas N.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 150 : 41 - 50
  • [10] NanoStringNorm: an extensible R package for the pre-processing of NanoString mRNA and miRNA data
    Waggott, Daryl
    Chu, Kenneth
    Yin, Shaoming
    Wouters, Bradly G.
    Liu, Fei-Fei
    Boutros, Paul C.
    BIOINFORMATICS, 2012, 28 (11) : 1546 - 1548