Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data

被引:4
作者
Budhraja, Sugam [1 ]
Doborjeh, Maryam [2 ]
Singh, Balkaran [1 ]
Tan, Samuel [3 ]
Doborjeh, Zohreh [4 ,5 ]
Lai, Edmund [2 ]
Merkin, Alexander [6 ,7 ]
Lee, Jimmy [8 ,9 ]
Goh, Wilson [9 ]
Kasabov, Nikola [2 ,10 ]
机构
[1] Auckland Univ Technol, Auckland, New Zealand
[2] Auckland Univ Technol, Sch Engn Comp & Math Sci, Auckland, New Zealand
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Univ Auckland, Ctr Brain Res, Auckland, New Zealand
[5] Univ Waikato, Sch Psychol, Hamilton, New Zealand
[6] AUT Univ, Inst Stroke & Appl Neurosci, Auckland, New Zealand
[7] AUT Univ, Dept Psychotherapy & Counselling, Auckland, New Zealand
[8] Nanyang Technol Univ, Inst Mental Hlth, Singapore, Singapore
[9] Nanyang Technol Univ, Lee Kong Chian Sch Med, Singapore, Singapore
[10] KEDRI, Auckland, New Zealand
基金
新加坡国家研究基金会;
关键词
feature selection; biomarker discovery; ensemble learning; high-dimensional data; genomics; proteomics; GENE SELECTION; CLASSIFICATION; RISK; PSYCHOSIS; PROMOTES; SYSTEMS; TOOL; RNA;
D O I
10.1093/bib/bbad382
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Selecting informative features, such as accurate biomarkers for disease diagnosis, prognosis and response to treatment, is an essential task in the field of bioinformatics. Medical data often contain thousands of features and identifying potential biomarkers is challenging due to small number of samples in the data, method dependence and non-reproducibility. This paper proposes a novel ensemble feature selection method, named Filter and Wrapper Stacking Ensemble (FWSE), to identify reproducible biomarkers from high-dimensional omics data. In FWSE, filter feature selection methods are run on numerous subsets of the data to eliminate irrelevant features, and then wrapper feature selection methods are applied to rank the top features. The method was validated on four high-dimensional medical datasets related to mental illnesses and cancer. The results indicate that the features selected by FWSE are stable and statistically more significant than the ones obtained by existing methods while also demonstrating biological relevance. Furthermore, FWSE is a generic method, applicable to various high-dimensional datasets in the fields of machine intelligence and bioinformatics.
引用
收藏
页数:17
相关论文
共 91 条
  • [1] Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
    Abeel, Thomas
    Helleputte, Thibault
    Van de Peer, Yves
    Dupont, Pierre
    Saeys, Yvan
    [J]. BIOINFORMATICS, 2010, 26 (03) : 392 - 398
  • [2] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [3] Retire statistical significance
    Amrhein, Valentin
    Greenland, Sander
    McShane, Blake
    [J]. NATURE, 2019, 567 (7748) : 305 - 307
  • [4] Ensemble Feature Learning of Genomic Data Using Support Vector Machine
    Anaissi, Ali
    Goyal, Madhu
    Catchpoole, Daniel R.
    Braytee, Ali
    Kennedy, Paul J.
    [J]. PLOS ONE, 2016, 11 (06):
  • [5] Characterization of a Cleavage Stimulation Factor, 3′ pre-RNA, Subunit 2, 64 kDa (CSTF2) as a Therapeutic Target for Lung Cancer
    Aragaki, Masato
    Takahashi, Koji
    Akiyama, Hirohiko
    Tsuchiya, Eiju
    Kondo, Satoshi
    Nakamura, Yusuke
    Daigo, Yataro
    [J]. CLINICAL CANCER RESEARCH, 2011, 17 (18) : 5889 - 5900
  • [6] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants
    Bauer, E
    Kohavi, R
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 105 - 139
  • [7] Ben Brahim Afef, 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS), P151, DOI 10.1109/HPCSim.2013.6641406
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Pasting small votes for classification in large databases and on-line
    Breiman, L
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 85 - 103
  • [10] The FDA NIH Biomarkers, EndpointS, and other Tools (BEST) resource in neuro-oncology
    Cagney, Daniel N.
    Sul, Joohee
    Huang, Raymond Y.
    Ligon, Keith L.
    Wen, Patrick Y.
    Alexander, Brian M.
    [J]. NEURO-ONCOLOGY, 2018, 20 (09) : 1162 - 1172