Stability of Feature Selection in Multi-Omics Data Analysis

被引:1
作者
Lukaszuk, Tomasz [1 ]
Krawczuk, Jerzy [1 ]
Zyla, Kamil [2 ]
Kesik, Jacek [2 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
[2] Lublin Univ Technol, Fac Elect Engn & Comp Sci, Dept Comp Sci, Nadbystrzycka 36B, PL-20618 Lublin, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
multi-omics; high-dimensional data; cancer genomics; feature selection; stability; L1; regularization; CLASSIFICATION; ALGORITHMS;
D O I
10.3390/app142311103
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation and rna layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.
引用
收藏
页数:16
相关论文
共 52 条
[1]  
Ahmed Z., 2024, BMC Methods, V1, DOI DOI 10.1186/S44330-024-00004-5
[2]   Approaches to Multi-Objective Feature Selection: A Systematic Literature Review [J].
Al-Tashi, Qasem ;
Abdulkadir, Said Jadid ;
Rais, Helmi Md ;
Mirjalili, Seyedali ;
Alhussian, Hitham .
IEEE ACCESS, 2020, 8 :125076-125096
[3]   Exploring Large Digital Bodies for the Study of Human Behavior [J].
Albuquerque, Ulysses Paulino ;
Cantalice, Anibal Silva ;
Oliveira, Edwine Soares ;
de Moura, Joelson Moreno Brito ;
dos Santos, Rayane Karoline Silva ;
da Silva, Risoneide Henriques ;
Brito Jr, Valdir Moura ;
Ferreira Jr, Washington Soares .
EVOLUTIONARY PSYCHOLOGICAL SCIENCE, 2023, 9 (03) :385-394
[4]  
Alkhateeb A., 2023, Machine Learning Methods for Multi-Omics Data Integration
[5]   Analysis and comparison of feature selection methods towards performance and stability [J].
Barbieri, Matheus Cezimbra ;
Grisci, Bruno Iochins ;
Dorn, Marcio .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[6]  
Bartlett PL, 2008, J MACH LEARN RES, V9, P1823
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Prospects and challenges of multi-omics data integration in toxicology [J].
Canzler, Sebastian ;
Schor, Jana ;
Busch, Wibke ;
Schubert, Kristin ;
Rolle-Kampczyk, Ulrike E. ;
Seitz, Herve ;
Kamp, Hennicke ;
von Bergen, Martin ;
Buesen, Roland ;
Hackermueller, Joerg .
ARCHIVES OF TOXICOLOGY, 2020, 94 (02) :371-388
[9]   Multi-OMICS approaches in cancer biology: New era in cancer therapy [J].
Chakraborty, Sohini ;
Sharma, Gaurav ;
Karmakar, Sricheta ;
Banerjee, Satarupa .
BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR BASIS OF DISEASE, 2024, 1870 (05)
[10]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297