Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

被引:3
|
作者
Pasha, Syed Javeed [1 ]
Mohamed, E. Syed [2 ]
机构
[1] BS Abdur Rahman Crescent Inst Sci & Technol, Dept Comp Applicat, Chennai, Tamil Nadu, India
[2] BS Abdur Rahman Crescent Inst Sci & Technol, Dept Comp Sci, Chennai, Tamil Nadu, India
来源
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA) | 2019年
关键词
Bio inspired ensemble feature selection (BEFS) model; machine learning; data mining; feature selection; health care; disease risk prediction; breast cancer risk prediction; genetic algorithm; random forest; logistic regression; BREAST-CANCER; DIAGNOSIS;
D O I
10.1109/iccubea47591.2019.9129304
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Use of machine learning (ML) and data mining (DM) algorithms has surfaced more often in the recent years for disease risk prediction problems in the healthcare communities. Several traditional feature selection models are combined with the DM and ML algorithms to improve accuracy of the disease risk prediction. In this study, a new Bio-inspired Ensemble Feature Selection (BEFS) model is introduced which is applied with the DM and ML algorithms. In the BEFS model, the most relevant and highly contributing features in the prediction are determined with a bio-inspired algorithm i.e., genetic algorithm, and an ensemble algorithm i.e., random forest algorithm. These important features obtained from the proposed model are then combined in various combinations and applied with the DM and ML algorithms, here logistic regression (LR) and random forest (RF), and the results obtained are promising. The experiment is executed using the famous ML language R. To accomplish this objective, the Breast Cancer Wisconsin (Diagnostic) dataset of UCI (University of California, Irvine) ML repository is utilized. In the experimental outcomes, the highest accuracy attained with the BEFS model is 96.49%, the AUC (Area Under Curve) achieved is 96%, and the sensitivity is 98.11%. These results, which greatly improve the disease risk prediction, are higher than several other existing works, while utilizing only six most relevant features out of the thirty two features of the dataset.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] An adaptive and enhanced framework for daily stock market prediction using feature selection and ensemble learning algorithms
    Sivri, Mahmut Sami
    Ustundag, Alp
    JOURNAL OF BUSINESS ANALYTICS, 2024, 7 (01) : 42 - 62
  • [32] A Supervised Machine Learning Approach using Different Feature Selection Techniques on Voice Datasets for Prediction of Parkinson's Disease
    Aich, Satyabrata
    Kim, Hee-Cheol
    Younga, Kim
    Hui, Kueh Lee
    Al-Absi, Ahmed Abdulhakim
    Sain, Mangal
    2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, 2019, : 1116 - 1121
  • [33] Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets
    Pulat, Meryem
    Kocakoc, Ipek Deveci
    OPERATIONS RESEARCH AND DECISIONS, 2024, 34 (04) : 157 - 183
  • [34] Data Driven Feature Selection for Machine Learning Algorithms in Computer Vision
    Zhang, Fan
    Li, Wei
    Zhang, Yifan
    Feng, Zhiyong
    IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (06): : 4262 - 4272
  • [35] Prediction of Skin Disease Using Ensemble Data Mining Techniques and Feature Selection Method-a Comparative Study
    Verma, Anurag Kumar
    Pal, Saurabh
    Kumar, Surjeet
    APPLIED BIOCHEMISTRY AND BIOTECHNOLOGY, 2020, 190 (02) : 341 - 359
  • [36] A Gas Emission Prediction Model Based on Feature Selection and Improved Machine Learning
    Shao, Liangshan
    Zhang, Kun
    PROCESSES, 2023, 11 (03)
  • [37] Research on fire accident prediction and risk assessment algorithm based on data mining and machine learning
    Zhang, Ziyang
    Tan, Lingye
    Tiong, Robert
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
  • [38] Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
    Ali, Misbah
    Mazhar, Tehseen
    Al-Rasheed, Amal
    Shahzad, Tariq
    Ghadi, Yazeed Yasin
    Khan, Muhammad Amir
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [39] Feature selection with Fast Correlation-Based Filter for Breast cancer prediction and Classification using Machine Learning Algorithms
    Khourdifi, Youness
    Bahaj, Mohamed
    2018 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2018,
  • [40] Citrus huanglongbing detection: A hyperspectral data-driven model integrating feature band selection with machine learning algorithms
    Yan, Kangting
    Song, Xiaobing
    Yang, Jing
    Xiao, Junqi
    Xu, Xidan
    Guo, Jun
    Zhu, Hongyun
    Lan, Yubin
    Zhang, Yali
    CROP PROTECTION, 2025, 188