Stability of Feature Selection in Multi-Omics Data Analysis

被引:0
|
作者
Lukaszuk, Tomasz [1 ]
Krawczuk, Jerzy [1 ]
Zyla, Kamil [2 ]
Kesik, Jacek [2 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
[2] Lublin Univ Technol, Fac Elect Engn & Comp Sci, Dept Comp Sci, Nadbystrzycka 36B, PL-20618 Lublin, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
multi-omics; high-dimensional data; cancer genomics; feature selection; stability; L1; regularization; CLASSIFICATION; ALGORITHMS;
D O I
10.3390/app142311103
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation and rna layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Network analysis with multi-omics data using graphical LASSO
    Park, Jaehyun
    Won, Sungho
    GENETIC EPIDEMIOLOGY, 2020, 44 (05) : 509 - 509
  • [32] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie -Pier
    Bodein, Antoine
    Perin, Olivier
    Droit, Arnaud
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3735 - 3746
  • [33] Integrating FAIR Experimental Metadata for Multi-omics Data Analysis
    Doniparthi, Gajendra
    Mühlhaus, Timo
    Deßloch, Stefan
    Datenbank-Spektrum, 2024, 24 (02) : 107 - 115
  • [34] Directional integration and pathway enrichment analysis for multi-omics data
    Slobodyanyuk, Mykhaylo
    Bahcheli, Alexander T.
    Klein, Zoe P.
    Bayati, Masroor
    Strug, Lisa J.
    Reimand, Juri
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [35] Omics Pipe: a community-based framework for reproducible multi-omics data analysis
    Fisch, Kathleen M.
    Meissner, Tobias
    Gioia, Louis
    Ducom, Jean-Christophe
    Carland, Tristan M.
    Loguercio, Salvatore
    Su, Andrew I.
    BIOINFORMATICS, 2015, 31 (11) : 1724 - 1728
  • [36] The Omics Dashboard for Interactive Exploration of Metabolomics and Multi-Omics Data
    Paley, Suzanne
    Karp, Peter D.
    METABOLITES, 2024, 14 (01)
  • [37] A feature extraction framework for discovering pancancer driver genes based on multi-omics data
    Xiaomeng Xue
    Feng Li
    Junliang Shang
    Lingyun Dai
    Daohui Ge
    Qianqian Ren
    Quantitative Biology, 2024, 12 (02) : 173 - 181
  • [38] Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration
    McGowan, Thomas
    Johnson, James E.
    Kumar, Praveen
    Sajulga, Ray
    Mehta, Subina
    Jagtap, Pratik D.
    Griffin, Timothy J.
    GIGASCIENCE, 2020, 9 (04):
  • [39] Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer
    Bhadra, Tapas
    Mallik, Saurav
    Hasan, Neaj
    Zhao, Zhongming
    BMC BIOINFORMATICS, 2022, 23 (SUPPL 3)
  • [40] FAIR Data Cube, a FAIR data infrastructure for integrated multi-omics data analysis
    Xiaofeng Liao
    Thomas H.A. Ederveen
    Anna Niehues
    Casper de Visser
    Junda Huang
    Firdaws Badmus
    Cenna Doornbos
    Yuliia Orlova
    Purva Kulkarni
    K. Joeri van der Velde
    Morris A. Swertz
    Martin Brandt
    Alain J. van Gool
    Peter A. C. ’t Hoen
    Journal of Biomedical Semantics, 15 (1)