Stability of Feature Selection in Multi-Omics Data Analysis

被引:0
作者
Lukaszuk, Tomasz [1 ]
Krawczuk, Jerzy [1 ]
Zyla, Kamil [2 ]
Kesik, Jacek [2 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
[2] Lublin Univ Technol, Fac Elect Engn & Comp Sci, Dept Comp Sci, Nadbystrzycka 36B, PL-20618 Lublin, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
multi-omics; high-dimensional data; cancer genomics; feature selection; stability; L1; regularization; CLASSIFICATION; ALGORITHMS;
D O I
10.3390/app142311103
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation and rna layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer
    Bhadra, Tapas
    Mallik, Saurav
    Hasan, Neaj
    Zhao, Zhongming
    BMC BIOINFORMATICS, 2022, 23 (SUPPL 3)
  • [32] FAIR Data Cube, a FAIR data infrastructure for integrated multi-omics data analysis
    Liao, Xiaofeng
    Ederveen, Thomas H. A.
    Niehues, Anna
    de Visser, Casper
    Huang, Junda
    Badmus, Firdaws
    Doornbos, Cenna
    Orlova, Yuliia
    Kulkarni, Purva
    van der Velde, K. Joeri
    Swertz, Morris A.
    Brandt, Martin
    van Gool, Alain J.
    't Hoen, Peter A. C.
    JOURNAL OF BIOMEDICAL SEMANTICS, 2024, 15 (01):
  • [33] Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer
    Tapas Bhadra
    Saurav Mallik
    Neaj Hasan
    Zhongming Zhao
    BMC Bioinformatics, 23
  • [34] Towards multi-omics synthetic data integration
    Selvarajoo, Kumar
    Maurer-Stroh, Sebastian
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [35] Representation Learning for the Clustering of Multi-Omics Data
    Viaud, Gautier
    Mayilvahanan, Prasanna
    Cournede, Paul-Henry
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 135 - 145
  • [36] Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data
    Guo, Xinpeng
    Song, Yafei
    Xu, Dongyan
    Jin, Xueping
    Shang, Xuequn
    CURRENT BIOINFORMATICS, 2024, 19 (10) : 933 - 942
  • [37] A multi-omics data analysis workflow packaged as a FAIR Digital Object
    Niehues, Anna
    de Visser, Casper
    Hagenbeek, Fiona A.
    Kulkarni, Purva
    Pool, Rene
    Karu, Naama
    Kindt, Alida S. D.
    Singh, Gurnoor
    Vermeiren, Robert R. J. M.
    Boomsma, Dorret, I
    van Dongen, Jenny
    't Hoen, Peter A. C.
    van Gool, Alain J.
    GIGASCIENCE, 2024, 13
  • [38] Using machine learning approaches for multi-omics data analysis: A review
    Reel, Parminder S.
    Reel, Smarti
    Pearson, Ewan
    Trucco, Emanuele
    Jefferson, Emily
    BIOTECHNOLOGY ADVANCES, 2021, 49
  • [39] Deep Learning for Integrated Analysis of Insulin Resistance with Multi-Omics Data
    Huang, Eunchong
    Kim, Sarah
    Ahn, TaeJin
    JOURNAL OF PERSONALIZED MEDICINE, 2021, 11 (02): : 1 - 14
  • [40] A pan-cancer integrative pathway analysis of multi-omics data
    Linder, Henry
    Zhang, Yuping
    QUANTITATIVE BIOLOGY, 2020, 8 (02) : 130 - 142