Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification

被引:9
作者
Azrin Sultana
Rakibul Islam
机构
[1] American International University-Bangladesh,Departmentof Computer Science
关键词
Thyroid disease prediction; Random forest; Healthcare; Machine learning; Feature selection;
D O I
10.1186/s43067-023-00101-5
中图分类号
学科分类号
摘要
Thyroid disease (TD) develops when the thyroid does not generate an adequate quantity of thyroid hormones as well as when a lump or nodule emerges due to aberrant growth of the thyroid gland. As a result, early detection was pertinent in preventing or minimizing the impact of this disease. In this study, different machine learning (ML) algorithms with a combination of scaling method, oversampling technique, and various feature selection approaches have been applied to make an efficient framework to classify TD. In addition, significant risk factors of TD were also identified in this proposed system. The dataset was collected from the University of California Irvine (UCI) repository for this research. After that, in the preprocessing stage, Synthetic Minority Oversampling Technique (SMOTE) was used to resolve the imbalance class problem and robust scaling technique was used to scale the dataset. The Boruta, Recursive Feature Elimination (RFE), and Least Absolute Shrinkage and Selection Operator (LASSO) approaches were used to select appropriate features. To train the model, we employed six different ML classifiers: Support Vector Machine (SVM), AdaBoost (AB), Decision Tree (DT), Gradient Boosting (GB), K-Nearest Neighbors (KNN), and Random Forest (RF). The models were examined using a 5-fold CV. Different performance metrics were observed to compare the effectiveness of the algorithms. The system achieved the most accurate results using the RF classifier, with 99% accuracy. This proposed system will be beneficial for physicians and patients to classify TD as well as to learn about the associated risk factors of TD.
引用
收藏
相关论文
共 50 条
  • [31] The Impact of Feature Selection on Different Machine Learning Models for Breast Cancer Classification
    Algherairy, Atheer
    Almattar, Wadha
    Bakri, Eman
    Albelali, Salma
    2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 91 - 96
  • [32] Machine Learning and Feature Selection for the Classification of Mental Disorders from Methylation Data
    Bartlett, Christopher L.
    Glatt, Stephen J.
    Bichindaritz, Isabelle
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2019, 2019, 11526 : 311 - 321
  • [33] A Review of Machine Learning Methods of Feature Selection and Classification for Autism Spectrum Disorder
    Rahman, Md. Mokhlesur
    Usman, Opeyemi Lateef
    Muniyandi, Ravie Chandren
    Sahran, Shahnorbanun
    Mohamed, Suziyani
    Razak, Rogayah A.
    BRAIN SCIENCES, 2020, 10 (12) : 1 - 23
  • [34] A Comparative Study of Feature Selection and Machine Learning Algorithms for Arabic Sentiment Classification
    Omar, Nazlia
    Albared, Mohammed
    Al-Moslmi, Tareq
    Al-Shabi, Adel
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 429 - 443
  • [35] A comparative study of feature selection and machine learning algorithms for arabic sentiment classification
    Omar, Nazlia
    Albared, Mohammed
    Al-Moslmi, Tareq
    Al-Shabi, Adel
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8870 : 429 - 443
  • [36] Utilizing Various Machine Learning Techniques for Diabetes Mellitus Feature Selection and Classification
    Sheta, Alaa
    Elashmawi, Walaa H.
    Al-Qerem, Ahmad
    Othman, Emad S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 1372 - 1384
  • [37] A Machine Learning Framework with Feature Selection for Floorplan Acceleration in IC Physical Design
    Shu-Zheng Zhang
    Zhen-Yu Zhao
    Chao-Chao Feng
    Lei Wang
    Journal of Computer Science and Technology, 2020, 35 : 468 - 474
  • [38] A machine learning approach for feature selection traffic classification using security analysis
    Shafiq, Muhammad
    Yu, Xiangzhan
    Bashir, Ali Kashif
    Chaudhry, Hassan Nazeer
    Wang, Dawei
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (10) : 4867 - 4892
  • [39] Feature selection and classification of protein protein complexes based on their binding affinities using machine learning approaches
    Yugandhar, K.
    Gromiha, M. Michael
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2014, 82 (09) : 2088 - 2096
  • [40] A machine learning approach for feature selection traffic classification using security analysis
    Muhammad Shafiq
    Xiangzhan Yu
    Ali Kashif Bashir
    Hassan Nazeer Chaudhry
    Dawei Wang
    The Journal of Supercomputing, 2018, 74 : 4867 - 4892