A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

被引:269
|
作者
Pudjihartono, Nicholas [1 ]
Fadason, Tayaza [1 ,2 ]
Kempa-Liehr, Andreas W. [3 ]
O'Sullivan, Justin M. [1 ,2 ,4 ,5 ,6 ]
机构
[1] Univ Auckland, Liggins Inst, Auckland, New Zealand
[2] Maurice Wilkins Ctr Mol Biodiscovery, Auckland, New Zealand
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
[4] Univ Southampton, MRC Lifecourse Epidemiol Unit, Southampton, England
[5] ASTAR, Singapore Inst Clin Sci, Singapore, Singapore
[6] Garvan Inst Med Res, Australian Parkinsons Mission, Sydney, NSW, Australia
来源
FRONTIERS IN BIOINFORMATICS | 2022年 / 2卷
关键词
machine learing; feature selection (FS); risk prediction; disease risk prediction; statistical approaches; GENOME-WIDE ASSOCIATION; ROBUST FEATURE-SELECTION; FALSE DISCOVERY RATE; MUTUAL INFORMATION; RANDOM FORESTS; GENE; RELEVANCE; LOCI; GWAS; DIMENSIONALITY;
D O I
10.3389/fbinf.2022.927312
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models
    Jalali-najafabadi, Farideh
    Stadler, Michael
    Dand, Nick
    Jadon, Deepak
    Soomro, Mehreen
    Ho, Pauline
    Marzo-Ortega, Helen
    Helliwell, Philip
    Korendowych, Eleanor
    Simpson, Michael A.
    Packham, Jonathan
    Smith, Catherine H.
    Barker, Jonathan N.
    McHugh, Neil
    Warren, Richard B.
    Barton, Anne
    Bowes, John
    Smith, Catherine H.
    Smith, Catherine H.
    Barker, Jonathan N.
    Warren, Richard B.
    Dand, Nick
    Dand, Nick
    Smith, Catherine H.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [2] A review of feature selection methods based on mutual information
    Vergara, Jorge R.
    Estevez, Pablo A.
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (01) : 175 - 186
  • [3] Risk prediction of coal mine rock burst based on machine learning and feature selection algorithm
    Miao, Dejun
    Yao, Kaixin
    Wang, Wenhao
    Liu, Lu
    Sui, Xiuhua
    GEORISK-ASSESSMENT AND MANAGEMENT OF RISK FOR ENGINEERED SYSTEMS AND GEOHAZARDS, 2024, 18 (04) : 868 - 881
  • [4] Machine learning-based risk prediction of acute kidney disease and hospital mortality in older patients
    Wang, Xinyuan
    Xu, Lingyu
    Guan, Chen
    Xu, Daojun
    Che, Lin
    Wang, Yanfei
    Man, Xiaofei
    Li, Chenyu
    Xu, Yan
    FRONTIERS IN MEDICINE, 2024, 11
  • [5] A review of gene selection methods based on machine learning approaches
    Lee, Hajoung
    Kim, Jaejik
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (05) : 667 - 684
  • [6] A review of feature selection methods based on mutual information
    Jorge R. Vergara
    Pablo A. Estévez
    Neural Computing and Applications, 2014, 24 : 175 - 186
  • [7] Ensemble learning-based stability improvement method for feature selection towards performance prediction
    Xiang, Feng
    Zhao, Yulong
    Zhang, Meng
    Zuo, Ying
    Zou, Xiaofu
    Tao, Fei
    JOURNAL OF MANUFACTURING SYSTEMS, 2024, 74 (55-67) : 55 - 67
  • [8] Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction
    Pasha, Syed Javeed
    Mohamed, E. Syed
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 590 - 596
  • [9] Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction
    Pasha, Syed Javeed
    Mohamed, E. Syed
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [10] Risk estimation and risk prediction using machine-learning methods
    Kruppa, Jochen
    Ziegler, Andreas
    Koenig, Inke R.
    HUMAN GENETICS, 2012, 131 (10) : 1639 - 1654