Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing

被引:0
|
作者
Gliozzo, Jessica [1 ,2 ]
Soto-Gomez, Mauricio [1 ]
Guarino, Valentina [1 ]
Bonometti, Arturo [3 ,4 ]
Cabri, Alberto [1 ]
Cavalleri, Emanuele [1 ]
Reese, Justin [5 ]
Robinson, Peter N. [6 ]
Mesiti, Marco [1 ,5 ]
Valentini, Giorgio [1 ,7 ]
Casiraghi, Elena [1 ,5 ,7 ,8 ]
机构
[1] Univ Studi Milano, Comp Sci Dept, AnacletoLab, Milan, Italy
[2] European Commiss, Joint Res Ctr JRC, Ispra, Italy
[3] Humanitas Univ, Dept Biomed Sci, Milan, Italy
[4] IRCCS Humanitas Clin & Res Hosp, Dept Pathol, Milan, Italy
[5] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA USA
[6] Jackson Lab Genom Med, Farmington, CT USA
[7] Infolife Natl Lab, CINI, Rome, Italy
[8] Aalto Univ, Dept Comp Sci, Espoo, Finland
关键词
Dimensionality reduction; Intrinsic dimensionality; Feature selection; Feature extraction; Data fusion; Multi-omics datasets; PREDICTION;
D O I
10.1016/j.artmed.2024.103049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Dimension reduction techniques for the integrative analysis of multi-omics data
    Meng, Chen
    Zeleznik, Oana A.
    Thallinger, Gerhard G.
    Kuster, Bernhard
    Gholami, Amin M.
    Culhane, Aedin C.
    BRIEFINGS IN BIOINFORMATICS, 2016, 17 (04) : 628 - 641
  • [2] Integrative Sufficient Dimension Reduction Methods for Multi-Omics Data Analysis
    Jain, Yashita
    Ding, Shanshan
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 616 - 616
  • [3] Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem
    Gao, Jie
    Langberg, Michael
    Schulman, Leonard J.
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 464 - +
  • [4] Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem
    Jie Gao
    Michael Langberg
    Leonard J. Schulman
    Discrete & Computational Geometry, 2008, 40 : 537 - 560
  • [5] Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem
    Gao, Jie
    Langberg, Michael
    Schulman, Leonard J.
    DISCRETE & COMPUTATIONAL GEOMETRY, 2008, 40 (04) : 537 - 560
  • [6] Dealing with dimensionality: the application of machine learning to multi-omics data
    Feldner-Busztin, Dylan
    Nisantzis, Panos Firbas
    Edmunds, Shelley Jane
    Boza, Gergely
    Racimo, Fernando
    Gopalakrishnan, Shyam
    Limborg, Morten Tonsberg
    Lahti, Leo
    de Polavieja, Gonzalo G.
    BIOINFORMATICS, 2023, 39 (02)
  • [7] Survey on Multi-omics, and Multi-omics Data Analysis, Integration and Application
    Shahrajabian, Mohamad Hesam
    Sun, Wenli
    CURRENT PHARMACEUTICAL ANALYSIS, 2023, 19 (04) : 267 - 281
  • [8] Visual analysis of multi-omics data
    Swart, Austin
    Caspi, Ron
    Paley, Suzanne
    Karp, Peter D.
    FRONTIERS IN BIOINFORMATICS, 2024, 4
  • [9] A practical data processing workflow for multi-OMICS projects
    Kohl, Michael
    Megger, Dominik A.
    Trippler, Martin
    Meckel, Hagen
    Ahrens, Maike
    Bracht, Thilo
    Weber, Frank
    Hoffmann, Andreas-Claudius
    Baba, Hideo A.
    Sitek, Barbara
    Schlaak, Joerg F.
    Meyer, Helmut E.
    Stephan, Christian
    Eisenacher, Martin
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2014, 1844 (01): : 52 - 62
  • [10] Multi-insight visualization of multi-omics data via ensemble dimension reduction and tensor factorization
    Fanaee-T, Hadi
    Thoresen, Magne
    BIOINFORMATICS, 2019, 35 (10) : 1625 - 1633