Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing

被引:0
|
作者
Gliozzo, Jessica [1 ,2 ]
Soto-Gomez, Mauricio [1 ]
Guarino, Valentina [1 ]
Bonometti, Arturo [3 ,4 ]
Cabri, Alberto [1 ]
Cavalleri, Emanuele [1 ]
Reese, Justin [5 ]
Robinson, Peter N. [6 ]
Mesiti, Marco [1 ,5 ]
Valentini, Giorgio [1 ,7 ]
Casiraghi, Elena [1 ,5 ,7 ,8 ]
机构
[1] Univ Studi Milano, Comp Sci Dept, AnacletoLab, Milan, Italy
[2] European Commiss, Joint Res Ctr JRC, Ispra, Italy
[3] Humanitas Univ, Dept Biomed Sci, Milan, Italy
[4] IRCCS Humanitas Clin & Res Hosp, Dept Pathol, Milan, Italy
[5] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA USA
[6] Jackson Lab Genom Med, Farmington, CT USA
[7] Infolife Natl Lab, CINI, Rome, Italy
[8] Aalto Univ, Dept Comp Sci, Espoo, Finland
关键词
Dimensionality reduction; Intrinsic dimensionality; Feature selection; Feature extraction; Data fusion; Multi-omics datasets; PREDICTION;
D O I
10.1016/j.artmed.2024.103049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies
    Mukherjee, Arnab
    Abraham, Suzanna
    Singh, Akshita
    Balaji, S.
    Mukunthan, K. S.
    MOLECULAR BIOTECHNOLOGY, 2024, 67 (4) : 1269 - 1289
  • [42] A cloud solution for multi-omics data integration
    Tordini, Fabio
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 559 - 566
  • [43] Making multi-omics data accessible to researchers
    Ana Conesa
    Stephan Beck
    Scientific Data, 6
  • [44] Making multi-omics data accessible to researchers
    Conesa, Ana
    Beck, Stephan
    SCIENTIFIC DATA, 2019, 6 (1)
  • [45] Integrative clustering methods for multi-omics data
    Zhang, Xiaoyu
    Zhou, Zhenwei
    Xu, Hanfei
    Liu, Ching-Ti
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2022, 14 (03)
  • [46] Towards multi-omics synthetic data integration
    Selvarajoo, Kumar
    Maurer-Stroh, Sebastian
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [47] Integrating multi-omics data for crop improvement
    Scossa, Federico
    Alseekh, Saleh
    Fernie, Alisdair R.
    JOURNAL OF PLANT PHYSIOLOGY, 2021, 257
  • [48] Representation Learning for the Clustering of Multi-Omics Data
    Viaud, Gautier
    Mayilvahanan, Prasanna
    Cournede, Paul-Henry
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 135 - 145
  • [49] Omics Pipe: a community-based framework for reproducible multi-omics data analysis
    Fisch, Kathleen M.
    Meissner, Tobias
    Gioia, Louis
    Ducom, Jean-Christophe
    Carland, Tristan M.
    Loguercio, Salvatore
    Su, Andrew I.
    BIOINFORMATICS, 2015, 31 (11) : 1724 - 1728
  • [50] Detecting the potential cancer association or metastasis by multi-omics data analysis
    Hua, L.
    Zheng, W. Y.
    Xia, H.
    Zhou, P.
    GENETICS AND MOLECULAR RESEARCH, 2016, 15 (03)