A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:269
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
FRONTIERS IN BIOINFORMATICS | 2022年 / 2卷
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Feature selection in machine learning: A new perspective
    Cai, Jie
    Luo, Jiawei
    Wang, Shulin
    Yang, Sheng
    NEUROCOMPUTING, 2018, 300 : 70 - 79
  • [22] Prediction of pasture yield using machine learning-based optical sensing: a systematic review
    Christoph Stumpe
    Joerg Leukel
    Tobias Zimpel
    Precision Agriculture, 2024, 25 : 430 - 459
  • [23] Prediction of pasture yield using machine learning-based optical sensing: a systematic review
    Stumpe, Christoph
    Leukel, Joerg
    Zimpel, Tobias
    PRECISION AGRICULTURE, 2024, 25 (01) : 430 - 459
  • [24] Deep learning-based polygenic risk analysis for Alzheimer's disease prediction
    Zhou, Xiaopu
    Chen, Yu
    Ip, Fanny C. F.
    Jiang, Yuanbing
    Cao, Han
    Lv, Ge
    Zhong, Huan
    Chen, Jiahang
    Ye, Tao
    Chen, Yuewen
    Zhang, Yulin
    Ma, Shuangshuang
    Lo, Ronnie M. N.
    Tong, Estella P. S.
    Mok, Vincent C. T.
    Kwok, Timothy C. Y.
    Guo, Qihao
    Mok, Kin Y.
    Shoai, Maryam
    Hardy, John
    Chen, Lei
    Fu, Amy K. Y.
    Ip, Nancy Y.
    COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [25] Classification of lung cancer using ensemble-based feature selection and machine learning methods
    Cai, Zhihua
    Xu, Dong
    Zhang, Qing
    Zhang, Jiexia
    Ngai, Sai-Ming
    Shao, Jianlin
    MOLECULAR BIOSYSTEMS, 2015, 11 (03) : 791 - 800
  • [26] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [27] Development of a machine learning-based risk prediction model for cerebral infarction and comparison with nomogram model
    Li, Xuewen
    Wang, Yiting
    Xu, Jiancheng
    JOURNAL OF AFFECTIVE DISORDERS, 2022, 314 : 341 - 348
  • [28] Machine Learning-Based Decision Support Framework for Construction Injury Severity Prediction and Risk Mitigation
    Gondia, Ahmed
    Ezzeldin, Mohamed
    El-Dakhakhni, Wael
    ASCE-ASME JOURNAL OF RISK AND UNCERTAINTY IN ENGINEERING SYSTEMS PART A-CIVIL ENGINEERING, 2022, 8 (03)
  • [29] Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
    Navarro, Constanza L. Andaur
    Damen, Johanna A. A.
    van Smeden, Maarten
    Takada, Toshihiko
    Nijman, Steven W. J.
    Dhiman, Paula
    Ma, Jie
    Collins, Gary S.
    Bajpai, Ram
    Riley, Richard D.
    Moons, Karel G. M.
    Hooft, Lotty
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2023, 154 : 8 - 22
  • [30] Machine Learning-Based Water Level Prediction in Lake Erie
    Wang, Qi
    Wang, Song
    WATER, 2020, 12 (10) : 1 - 14