Stability of Feature Selection in Multi-Omics Data Analysis

被引：0

作者：

Lukaszuk, Tomasz ^{[1
]}

Krawczuk, Jerzy ^{[1
]}

Zyla, Kamil ^{[2
]}

Kesik, Jacek ^{[2
]}

机构：

[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland

[2] Lublin Univ Technol, Fac Elect Engn & Comp Sci, Dept Comp Sci, Nadbystrzycka 36B, PL-20618 Lublin, Poland

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期

关键词：

multi-omics; high-dimensional data; cancer genomics; feature selection; stability; L1; regularization; CLASSIFICATION; ALGORITHMS;

D O I：

10.3390/app142311103

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation and rna layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.

引用

页数：16

共 50 条

[31] Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer
Bhadra, Tapas
Mallik, Saurav
Hasan, Neaj
Zhao, Zhongming
BMC BIOINFORMATICS, 2022, 23 (SUPPL 3)
[32] FAIR Data Cube, a FAIR data infrastructure for integrated multi-omics data analysis
Liao, Xiaofeng
Ederveen, Thomas H. A.
Niehues, Anna
de Visser, Casper
Huang, Junda
Badmus, Firdaws
Doornbos, Cenna
Orlova, Yuliia
Kulkarni, Purva
van der Velde, K. Joeri
Swertz, Morris A.
Brandt, Martin
van Gool, Alain J.
't Hoen, Peter A. C.
JOURNAL OF BIOMEDICAL SEMANTICS, 2024, 15 (01):
[33] Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer
Tapas Bhadra
Saurav Mallik
Neaj Hasan
Zhongming Zhao
BMC Bioinformatics, 23
[34] Towards multi-omics synthetic data integration
Selvarajoo, Kumar
Maurer-Stroh, Sebastian
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
[35] Representation Learning for the Clustering of Multi-Omics Data
Viaud, Gautier
Mayilvahanan, Prasanna
Cournede, Paul-Henry
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 135 - 145
[36] Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data
Guo, Xinpeng
Song, Yafei
Xu, Dongyan
Jin, Xueping
Shang, Xuequn
CURRENT BIOINFORMATICS, 2024, 19 (10) : 933 - 942
[37] A multi-omics data analysis workflow packaged as a FAIR Digital Object
Niehues, Anna
de Visser, Casper
Hagenbeek, Fiona A.
Kulkarni, Purva
Pool, Rene
Karu, Naama
Kindt, Alida S. D.
Singh, Gurnoor
Vermeiren, Robert R. J. M.
Boomsma, Dorret, I
van Dongen, Jenny
't Hoen, Peter A. C.
van Gool, Alain J.
GIGASCIENCE, 2024, 13
[38] Using machine learning approaches for multi-omics data analysis: A review
Reel, Parminder S.
Reel, Smarti
Pearson, Ewan
Trucco, Emanuele
Jefferson, Emily
BIOTECHNOLOGY ADVANCES, 2021, 49
[39] Deep Learning for Integrated Analysis of Insulin Resistance with Multi-Omics Data
Huang, Eunchong
Kim, Sarah
Ahn, TaeJin
JOURNAL OF PERSONALIZED MEDICINE, 2021, 11 (02): : 1 - 14
[40] A pan-cancer integrative pathway analysis of multi-omics data
Linder, Henry
Zhang, Yuping
QUANTITATIVE BIOLOGY, 2020, 8 (02) : 130 - 142

← 1 2 3 4 5 →