Stability of Feature Selection in Multi-Omics Data Analysis

被引:0
作者
Lukaszuk, Tomasz [1 ]
Krawczuk, Jerzy [1 ]
Zyla, Kamil [2 ]
Kesik, Jacek [2 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
[2] Lublin Univ Technol, Fac Elect Engn & Comp Sci, Dept Comp Sci, Nadbystrzycka 36B, PL-20618 Lublin, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
multi-omics; high-dimensional data; cancer genomics; feature selection; stability; L1; regularization; CLASSIFICATION; ALGORITHMS;
D O I
10.3390/app142311103
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation and rna layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration
    Pierre-Jean, Morgane
    Deleuze, Jean-Francois
    Le Floch, Edith
    Mauger, Florence
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (06) : 2011 - 2030
  • [42] Analysis and comparison of feature selection methods towards performance and stability
    Barbieri, Matheus Cezimbra
    Grisci, Bruno Iochins
    Dorn, Marcio
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [43] Adaptive Sparse Multi-Block PLS Discriminant Analysis: An Integrative Method for Identifying Key Biomarkers from Multi-Omics Data
    Zhang, Runzhi
    Datta, Susmita
    GENES, 2023, 14 (05)
  • [44] Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer
    Arjmand, Babak
    Hamidpour, Shayesteh Kokabi
    Tayanloo-Beik, Akram
    Goodarzi, Parisa
    Aghayan, Hamid Reza
    Adibi, Hossein
    Larijani, Bagher
    FRONTIERS IN GENETICS, 2022, 13
  • [45] A Review of the Stability of Feature Selection Techniques for Bioinformatics Data
    Awada, Wael
    Khoshgoftaar, Taghi M.
    Dittman, David
    Wald, Randall
    Napolitano, Amri
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 356 - 363
  • [46] From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies
    Mukherjee, Arnab
    Abraham, Suzanna
    Singh, Akshita
    Balaji, S.
    Mukunthan, K. S.
    MOLECULAR BIOTECHNOLOGY, 2024, 67 (4) : 1269 - 1289
  • [47] AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification
    Li M.
    Guo H.
    Wang K.
    Kang C.
    Yin Y.
    Zhang H.
    Computers in Biology and Medicine, 2024, 177
  • [48] asmbPLS: biomarker identification and patient survival prediction with multi-omics data
    Zhang, Runzhi
    Datta, Susmita
    FRONTIERS IN GENETICS, 2024, 15
  • [49] The Strategies and Progression in The Stratification of Hepatocellular Carcinoma Using Multi-omics Data
    Wang, Meng
    Li, Xiao-Qin
    Gao, Bin
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2023, 50 (07) : 1651 - 1663
  • [50] Priority-Elastic net for binary disease outcome prediction based on multi-omics data
    Musib, Laila
    Coletti, Roberta
    Lopes, Marta B.
    Mourino, Helena
    Carrasquinha, Eunice
    BIODATA MINING, 2024, 17 (01):