Diabetes Prediction using SMOTE and Machine Learning

被引:0
作者
Sarayu, Maganti Khyathi [1 ]
Bhanu, Shaik Ayesha [1 ]
Deekshitha, Karanam [1 ]
Meghana, Maduri [1 ]
Joseph, Iwin Thanakumar [1 ]
机构
[1] Koneru Lakshmaiah Educ Fdn, Dept Comp Sci & Engn, Vaddeswaram, Andhra Pradesh, India
来源
2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024 | 2024年
关键词
Diabetes Prediction; PIMA Dataset; Random Forest; Model Tuning; Data Preprocessing; Stratified Sampling; Class Imbalance; Performance Metrics; Machine Learning;
D O I
10.1109/ICICI62254.2024.00011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This research work explores highly sophisticated diabetes prediction algorithms employing the PIMA Indian Diabetes dataset. Proposed research intends to explore the influence of model update, assessment criteria, and data preparation on prediction algorithms. In this extensive research, a pre-selected dataset coupled with feature scaling, stratified selection, and oversampling is employed to tackle the issue of class imbalance. Through the use of advanced machine learning models like Random Forest, the research illustrates how modifying a component's features might enhance estimate accuracy. Using stratified shuffle split validation, the performance of the model is examined and discover large gains in accuracy, F-measure, precision, recall, and AUC. Proposed work underlines the necessity of data preparation for accurate diabetes prognosis and offers an example of outstanding Random Forest model construction.
引用
收藏
页码:15 / 20
页数:6
相关论文
共 12 条
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]  
Dua D., 2019, UCI machine learning repository Internet
[6]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[7]  
Hastie T., 2009, ELEMENTS STAT LEARNI
[8]  
He H., 2019, IEEE Transactions on Neural Networks and Learning Systems, V30, P3808
[9]  
Japkowicz N, 2000, IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, P111
[10]  
Lemaître G, 2017, J MACH LEARN RES, V18