Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification

被引:9
|
作者
Azrin Sultana
Rakibul Islam
机构
[1] American International University-Bangladesh,Departmentof Computer Science
关键词
Thyroid disease prediction; Random forest; Healthcare; Machine learning; Feature selection;
D O I
10.1186/s43067-023-00101-5
中图分类号
学科分类号
摘要
Thyroid disease (TD) develops when the thyroid does not generate an adequate quantity of thyroid hormones as well as when a lump or nodule emerges due to aberrant growth of the thyroid gland. As a result, early detection was pertinent in preventing or minimizing the impact of this disease. In this study, different machine learning (ML) algorithms with a combination of scaling method, oversampling technique, and various feature selection approaches have been applied to make an efficient framework to classify TD. In addition, significant risk factors of TD were also identified in this proposed system. The dataset was collected from the University of California Irvine (UCI) repository for this research. After that, in the preprocessing stage, Synthetic Minority Oversampling Technique (SMOTE) was used to resolve the imbalance class problem and robust scaling technique was used to scale the dataset. The Boruta, Recursive Feature Elimination (RFE), and Least Absolute Shrinkage and Selection Operator (LASSO) approaches were used to select appropriate features. To train the model, we employed six different ML classifiers: Support Vector Machine (SVM), AdaBoost (AB), Decision Tree (DT), Gradient Boosting (GB), K-Nearest Neighbors (KNN), and Random Forest (RF). The models were examined using a 5-fold CV. Different performance metrics were observed to compare the effectiveness of the algorithms. The system achieved the most accurate results using the RF classifier, with 99% accuracy. This proposed system will be beneficial for physicians and patients to classify TD as well as to learn about the associated risk factors of TD.
引用
收藏
相关论文
共 50 条
  • [21] Study on Feature Selection and Machine Learning Algorithms For Malay Sentiment Classification
    Alsaffar, Ahmed
    Omar, Nazlia
    PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 270 - 275
  • [22] Enhancing Software Requirements Classification with Machine Learning and Feature Selection Techniques
    Lanfear, Daniel
    Maleki, Mina
    Banitaan, Shadi
    SOFTWARE AND DATA ENGINEERING, SEDE 2024, 2025, 2244 : 14 - 30
  • [23] Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data
    Nazari, Elham
    Aghemiri, Mehran
    Avan, Amir
    Mehrabian, Amin
    Tabesh, Hamed
    GENE REPORTS, 2021, 25
  • [24] Survey on Classification and Feature Selection Approaches for Disease Diagnosis
    Tripathi, Diwakar
    Manoj, I
    Prasanth, G. Raja
    Neeraja, K.
    Varma, Mohan Krishna
    Reddy, B. Ramachandra
    EMERGING RESEARCH IN DATA ENGINEERING SYSTEMS AND COMPUTER COMMUNICATIONS, CCODE 2019, 2020, 1054 : 567 - 576
  • [25] Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction
    Pasha, Syed Javeed
    Mohamed, E. Syed
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 590 - 596
  • [26] Identification of risk factors for infection after mitral valve surgery through machine learning approaches
    Zhang, Ningjie
    Fan, Kexin
    Ji, Hongwen
    Ma, Xianjun
    Wu, Jingyi
    Huang, Yuanshuai
    Wang, Xinhua
    Gui, Rong
    Chen, Bingyu
    Zhang, Hui
    Zhang, Zugui
    Zhang, Xiufeng
    Gong, Zheng
    Wang, Yongjun
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2023, 10
  • [27] A machine learning approach for feature selection traffic classification using security analysis
    Shafiq, Muhammad
    Yu, Xiangzhan
    Bashir, Ali Kashif
    Chaudhry, Hassan Nazeer
    Wang, Dawei
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (10): : 4867 - 4892
  • [28] Flight State Identification of a Self-Sensing Wing via an Improved Feature Selection Method and Machine Learning Approaches
    Chen, Xi
    Kopsaftopoulos, Fotis
    Wu, Qi
    Ren, He
    Chang, Fu-Kuo
    SENSORS, 2018, 18 (05)
  • [29] A Machine Learning Framework with Feature Selection for Floorplan Acceleration in IC Physical Design
    Zhang, Shu-Zheng
    Zhao, Zhen-Yu
    Feng, Chao-Chao
    Wang, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (02) : 468 - 474
  • [30] A machine learning approach for feature selection traffic classification using security analysis
    Muhammad Shafiq
    Xiangzhan Yu
    Ali Kashif Bashir
    Hassan Nazeer Chaudhry
    Dawei Wang
    The Journal of Supercomputing, 2018, 74 : 4867 - 4892