A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:269
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
FRONTIERS IN BIOINFORMATICS | 2022年 / 2卷
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform
    Zhao, Zhenyu
    Anand, Radhika
    Wang, Mallory
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 442 - 452
  • [42] Role of Machine Learning-Based CT Body Composition in Risk Prediction and Prognostication: Current State and Future Directions
    Elhakim, Tarig
    Trinh, Kelly
    Mansur, Arian
    Bridge, Christopher
    Daye, Dania
    DIAGNOSTICS, 2023, 13 (05)
  • [43] Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation
    Rahman, Mizanur
    Sarwar, Hasan
    Kader, MD. Abdul
    Goncalves, Teresa
    Tin, Ting Tin
    IEEE ACCESS, 2024, 12 : 85661 - 85680
  • [44] Machine Learning-Based Prediction of Death and Hospitalization in Patients With Implantable Cardioverter Defibrillators
    Rosman, Lindsey
    Lampert, Rachel
    Wang, Kaicheng
    Gehi, Anil K.
    Dziura, James
    Salmoirago-Blotcher, Elena
    Brandt, Cynthia
    Sears, Samuel F.
    Burg, Matthew
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2025, 85 (01) : 42 - 55
  • [45] Machine Learning-Based Reconstruction and Prediction of Groundwater Time Series in the Allertal, Germany
    Tran, Tuong Vi
    Peche, Aaron
    Kringel, Robert
    Broemme, Katrin
    Altfelder, Sven
    WATER, 2025, 17 (03)
  • [46] MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing
    Mort, Matthew
    Sterne-Weiler, Timothy
    Li, Biao
    Ball, Edward V.
    Cooper, David N.
    Radivojac, Predrag
    Sanford, Jeremy R.
    Mooney, Sean D.
    GENOME BIOLOGY, 2014, 15 (01):
  • [47] Relevance assignation feature selection method based on mutual information for machine learning
    Gao, Liyang
    Wu, Weiguo
    KNOWLEDGE-BASED SYSTEMS, 2020, 209
  • [48] A prediction study on the occurrence risk of heart disease in older hypertensive patients based on machine learning
    Si, Fei
    Liu, Qian
    Yu, Jing
    BMC GERIATRICS, 2025, 25 (01)
  • [49] Breast cancer risk prediction using machine learning: a systematic review
    Hussain, Sadam
    Ali, Mansoor
    Naseem, Usman
    Nezhadmoghadam, Fahimeh
    Jatoi, Munsif Ali
    Gulliver, T. Aaron
    Tamez-Pena, Jose Gerardo
    FRONTIERS IN ONCOLOGY, 2024, 14
  • [50] Risk Prediction and Machine Learning A Case-Based Overview
    Balczewski, Emily A. A.
    Cao, Jie
    Singh, Karandeep
    CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2023, 18 (04): : 524 - 526