Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery

被引:9
|
作者
Lee, Hae Woo [1 ]
Lawton, Carl [1 ]
Na, Young Jeong [2 ]
Yoon, Seongkyu [1 ]
机构
[1] Univ Massachusetts, Dept Chem Engn, Lowell, MA 01854 USA
[2] Harvard Univ, Massachusetts Gen Hosp, Sch Med, Boston, MA USA
关键词
biomarker discovery; chemometrics; early detection; feature selection; omics; ovarian cancer; reproducibility; stability; CARLO CROSS-VALIDATION; SELDI-TOF MS; VARIABLE SELECTION; OVARIAN-CANCER; BREAST-CANCER; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; MASS-SPECTROMETRY; SERUM BIOMARKERS; STABILITY;
D O I
10.1515/sagmb-2012-0067
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In omics studies aimed at the early detection and diagnosis of cancer, bioinformatics tools play a significant role when analyzing high dimensional, complex datasets, as well as when identifying a small set of biomarkers. However, in many cases, there are ambiguities in the robustness and the consistency of the discovered biomarker sets, since the feature selection methods often lead to irreproducible results. To address this, both the stability and the classification power of several chemometrics-based feature selection algorithms were evaluated using the Monte Carlo sampling technique, aiming at finding the most suitable feature selection methods for early cancer detection and biomarker discovery. To this end, two data sets were analyzed, which comprised of MALDI-TOF-MS and LC/TOF-MS spectra measured on serum samples in order to diagnose ovarian cancer. Using these datasets, the stability and the classification power of multiple feature subsets found by different feature selection methods were quantified by varying either the number of selected features, or the number of samples in the training set, with special emphasis placed on the property of stability. The results show that high consistency does not necessarily guarantee high predictive power. In addition, differences in the stability, as well as agreement in feature lists between several feature selection methods, depend on several factors, such as the number of available samples, feature sizes, quality of the information in the dataset, etc. Among the tested methods, only the variable importance in projection (VIP)-based method shows complementary properties, providing both highly consistent and accurate subsets of features. In addition, successive projection analysis (SPA) was excellent with regards to maintaining high stability over a wide range of experimental conditions. The stability of several feature selection methods is highly variable, stressing the importance of making the proper choice among feature selection methods. Therefore, rather than evaluating the selected features using only classification accuracy, stability measurements should be examined as well to improve the reliability of biomarker discovery.
引用
收藏
页码:207 / 223
页数:17
相关论文
共 50 条
  • [41] Exosomes-based biomarker discovery for diagnosis and prognosis of prostate cancer
    Panigrahi, Gati Krushna
    Deep, Gagan
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2017, 22 : 1682 - 1696
  • [42] TOWARD EFFECTIVE BREAST CANCER DETECTION IN THERMAL IMAGES USING EFFICIENT FEATURE SELECTION ALGORITHM AND FEATURE EXTRACTION METHODS
    Moayedi, Seyedeh Maryam Zareh
    Rezai, Abdalhossein
    Hamidpour, Seyedeh Shahrbanoo Fallaheye
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024, 36 (02):
  • [43] Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks
    Feng, Yaokai
    Akiyama, Hitoshi
    Lu, Liang
    Sakurai, Kouichi
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 173 - 180
  • [44] Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data
    Wang, Aiguo
    Liu, Huancheng
    Yang, Jing
    Chen, Guilin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 142
  • [45] Stability in Biomarker Discovery: Does Ensemble Feature Selection Really Help?
    Dessi, Nicoletta
    Pes, Barbara
    CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 191 - 200
  • [46] Nectin-4 as Blood-Based Biomarker Enables Detection of Early Ovarian Cancer Stages
    Rogmans, Christoph
    Feuerborn, Julia
    Treeck, Leonie
    Tribian, Nils
    Floerkemeier, Inken
    Arnold, Norbert
    Weimer, Jorg Paul
    Maass, Nicolai
    Jansen, Peer
    Lieb, Wolfgang
    Dempfle, Astrid
    Bauerschlag, Dirk O.
    Hedemann, Nina
    CANCERS, 2022, 14 (23)
  • [47] Protein Z: A putative novel biomarker for early detection of ovarian cancer
    Russell, Matthew R.
    Walker, Michael J.
    Williamson, Andrew J. K.
    Gentry-Maharaj, Aleksandra
    Ryan, Andy
    Kalsi, Jatinderpal
    Skates, Steven
    D'Amato, Alfonsina
    Dive, Caroline
    Pernemalm, Maria
    Humphryes, Phillip C.
    Fourkala, Evangelia-Ourania
    Whetton, Anthony D.
    Menon, Usha
    Jacobs, Ian
    Graham, Robert L. J.
    INTERNATIONAL JOURNAL OF CANCER, 2016, 138 (12) : 2984 - 2992
  • [48] Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery
    Annette Spooner
    Gelareh Mohammadi
    Perminder S. Sachdev
    Henry Brodaty
    Arcot Sowmya
    BMC Bioinformatics, 24
  • [49] Interpretation of Depression Detection Models via Feature Selection Methods
    Alghowinem, Sharifa
    Gedeon, Tom
    Goecke, Roland
    Cohn, Jeffrey F. F.
    Parker, Gordon
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 133 - 152
  • [50] Robust Exclusive Adaptive Sparse Feature Selection for Biomarker Discovery and Early Diagnosis of Neuropsychiatric Systemic Lupus Erythematosus
    Quan, Tianhong
    Yuan, Ye
    Luo, Yu
    Zhou, Teng
    Qin, Jing
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT V, 2023, 14224 : 127 - 135