Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification

被引:9
作者
Azrin Sultana
Rakibul Islam
机构
[1] American International University-Bangladesh,Departmentof Computer Science
关键词
Thyroid disease prediction; Random forest; Healthcare; Machine learning; Feature selection;
D O I
10.1186/s43067-023-00101-5
中图分类号
学科分类号
摘要
Thyroid disease (TD) develops when the thyroid does not generate an adequate quantity of thyroid hormones as well as when a lump or nodule emerges due to aberrant growth of the thyroid gland. As a result, early detection was pertinent in preventing or minimizing the impact of this disease. In this study, different machine learning (ML) algorithms with a combination of scaling method, oversampling technique, and various feature selection approaches have been applied to make an efficient framework to classify TD. In addition, significant risk factors of TD were also identified in this proposed system. The dataset was collected from the University of California Irvine (UCI) repository for this research. After that, in the preprocessing stage, Synthetic Minority Oversampling Technique (SMOTE) was used to resolve the imbalance class problem and robust scaling technique was used to scale the dataset. The Boruta, Recursive Feature Elimination (RFE), and Least Absolute Shrinkage and Selection Operator (LASSO) approaches were used to select appropriate features. To train the model, we employed six different ML classifiers: Support Vector Machine (SVM), AdaBoost (AB), Decision Tree (DT), Gradient Boosting (GB), K-Nearest Neighbors (KNN), and Random Forest (RF). The models were examined using a 5-fold CV. Different performance metrics were observed to compare the effectiveness of the algorithms. The system achieved the most accurate results using the RF classifier, with 99% accuracy. This proposed system will be beneficial for physicians and patients to classify TD as well as to learn about the associated risk factors of TD.
引用
收藏
相关论文
共 50 条
  • [41] Risk factors identification and injury severity classification in Alaska's mining industry using statistical and machine learning approaches
    Chatterjee, Snehamoy
    Kadrolli, Poorva
    Kaunda, Rennie
    Miller, Hugh
    Majdara, Aref
    INTERNATIONAL JOURNAL OF MINING RECLAMATION AND ENVIRONMENT, 2025,
  • [42] Thyroid Disease Prediction Using Machine Learning Approaches
    Chaubey, Gyanendra
    Bisen, Dhananjay
    Arjaria, Siddharth
    Yadav, Vibhash
    NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2021, 44 (03): : 233 - 238
  • [43] Thyroid Disease Treatment prediction with machine learning approaches
    Aversano, Lerina
    Bernardi, Mario Luca
    Cimitile, Marta
    Iammarino, Martina
    Macchia, Paolo Emidio
    Nettore, Immacolata Cristina
    Verdone, Chiara
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 1031 - 1040
  • [44] Thyroid Disease Prediction Using Machine Learning Approaches
    Gyanendra Chaubey
    Dhananjay Bisen
    Siddharth Arjaria
    Vibhash Yadav
    National Academy Science Letters, 2021, 44 : 233 - 238
  • [45] Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques
    A. S. M. Shafi
    M. M. Imran Molla
    Julakha Jahan Jui
    Mohammad Motiur Rahman
    SN Applied Sciences, 2020, 2
  • [46] Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques
    Shafi, A. S. M.
    Molla, M. M. Imran
    Jui, Julakha Jahan
    Rahman, Mohammad Motiur
    SN APPLIED SCIENCES, 2020, 2 (07):
  • [47] Prediction of heart disease by classifying with feature selection and machine learning methods
    Gazeloglu, Cengiz
    PROGRESS IN NUTRITION, 2020, 22 (02): : 660 - 670
  • [48] Machine learning-based classification of bronze alloy cymbals from microphone captured data enhanced with feature selection approaches
    Boratto, Tales H. A.
    Cury, Alexandre A.
    Goliatt, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [49] Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment
    Marcos-Zambrano, Laura Judith
    Karaduzovic-Hadziabdic, Kanita
    Loncar Turukalo, Tatjana
    Przymus, Piotr
    Trajkovik, Vladimir
    Aasmets, Oliver
    Berland, Magali
    Gruca, Aleksandra
    Hasic, Jasminka
    Hron, Karel
    Klammsteiner, Thomas
    Kolev, Mikhail
    Lahti, Leo
    Lopes, Marta B.
    Moreno, Victor
    Naskinova, Irina
    Org, Elin
    Paciencia, Ines
    Papoutsoglou, Georgios
    Shigdel, Rajesh
    Stres, Blaz
    Vilne, Baiba
    Yousef, Malik
    Zdravevski, Eftim
    Tsamardinos, Ioannis
    Carrillo de Santa Pau, Enrique
    Claesson, Marcus J.
    Moreno-Indias, Isabel
    Truu, Jaak
    FRONTIERS IN MICROBIOLOGY, 2021, 12
  • [50] Feature Selection Using PSO Optimized-Framework with Machine Learning Classification System via Breast Cancer Survival Data
    Papasani, Anusha
    Devarakonda, Nagaraju
    Polkowski, Zdzislaw
    Thotakura, Madhavi
    Lakshmi, N. Bhagya
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING ( ICCVBIC 2021), 2022, 1420 : 513 - 531