Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

被引:3
|
作者
Pasha, Syed Javeed [1 ]
Mohamed, E. Syed [2 ]
机构
[1] BS Abdur Rahman Crescent Inst Sci & Technol, Dept Comp Applicat, Chennai, Tamil Nadu, India
[2] BS Abdur Rahman Crescent Inst Sci & Technol, Dept Comp Sci, Chennai, Tamil Nadu, India
来源
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA) | 2019年
关键词
Bio inspired ensemble feature selection (BEFS) model; machine learning; data mining; feature selection; health care; disease risk prediction; breast cancer risk prediction; genetic algorithm; random forest; logistic regression; BREAST-CANCER; DIAGNOSIS;
D O I
10.1109/iccubea47591.2019.9129304
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Use of machine learning (ML) and data mining (DM) algorithms has surfaced more often in the recent years for disease risk prediction problems in the healthcare communities. Several traditional feature selection models are combined with the DM and ML algorithms to improve accuracy of the disease risk prediction. In this study, a new Bio-inspired Ensemble Feature Selection (BEFS) model is introduced which is applied with the DM and ML algorithms. In the BEFS model, the most relevant and highly contributing features in the prediction are determined with a bio-inspired algorithm i.e., genetic algorithm, and an ensemble algorithm i.e., random forest algorithm. These important features obtained from the proposed model are then combined in various combinations and applied with the DM and ML algorithms, here logistic regression (LR) and random forest (RF), and the results obtained are promising. The experiment is executed using the famous ML language R. To accomplish this objective, the Breast Cancer Wisconsin (Diagnostic) dataset of UCI (University of California, Irvine) ML repository is utilized. In the experimental outcomes, the highest accuracy attained with the BEFS model is 96.49%, the AUC (Area Under Curve) achieved is 96%, and the sensitivity is 98.11%. These results, which greatly improve the disease risk prediction, are higher than several other existing works, while utilizing only six most relevant features out of the thirty two features of the dataset.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction
    Pasha, Syed Javeed
    Mohamed, E. Syed
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 590 - 596
  • [2] Novel Feature Reduction (NFR) Model With Machine Learning and Data Mining Algorithms for Effective Disease Risk Prediction
    Pasha, Syed Javeed
    Mohamed, E. Syed
    IEEE ACCESS, 2020, 8 : 184087 - 184108
  • [3] A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring
    Koutanaei, Fatemeh Nemati
    Sajedi, Hedieh
    Khanbabaei, Mohammad
    JOURNAL OF RETAILING AND CONSUMER SERVICES, 2015, 27 : 11 - 23
  • [4] Feature selection and risk prediction for patients with coronary artery disease using data mining
    Md Idris, Nashreen
    Chiam, Yin Kia
    Varathan, Kasturi Dewi
    Wan Ahmad, Wan Azman
    Chee, Kok Han
    Liew, Yih Miin
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (12) : 3123 - 3140
  • [5] Heuristic Model to Improve Feature Selection Based on Machine Learning in Data Mining
    Majumdar, Jahin
    Mal, Anwesha
    Gupta, Shruti
    2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 73 - 77
  • [6] Data-Driven Diabetes Risk Factor Prediction Using Machine Learning Algorithms with Feature Selection Technique
    Kakoly, Israt Jahan
    Hoque, Md. Rakibul
    Hasan, Najmul
    SUSTAINABILITY, 2023, 15 (06)
  • [7] Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms With Relief and LASSO Feature Selection Techniques
    Ghosh, Pronab
    Azam, Sami
    Jonkman, Mirjam
    Karim, Asif
    Shamrat, F. M. Javed Mehedi
    Ignatious, Eva
    Shultana, Shahana
    Beeravolu, Abhijith Reddy
    De Boer, Friso
    IEEE ACCESS, 2021, 9 : 19304 - 19326
  • [8] An Ensemble Feature Selection Approach-Based Machine Learning Classifiers for Prediction of COVID-19 Disease
    Hossen, Md. Jakir
    Ramanathan, Thirumalaimuthu Thirumalaiappan
    Al Mamun, Abdullah
    INTERNATIONAL JOURNAL OF TELEMEDICINE AND APPLICATIONS, 2024, 2024
  • [9] Efficient Model for Prediction of Parkinson's Disease Using Machine Learning Algorithms with Hybrid Feature Selection Methods
    Singh, Nutan
    Tripathi, Priyanka
    BIOMEDICAL ENGINEERING SCIENCE AND TECHNOLOGY, ICBEST 2023, 2024, 2003 : 186 - 203
  • [10] Application of Data Mining Algorithms for Feature Selection and Prediction of Diabetic Retinopathy
    Oladele, Tinuke O.
    Ogundokun, Roseline Oluwaseun
    Kayode, Aderonke Anthonia
    Adegun, Adekanmi Adeyinka
    Adebiyi, Marion Oluwabunmi
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V, 2019, 11623 : 716 - 730