A deep contrastive multi-modal encoder for multi-omics data integration and analysis

被引:0
|
作者
Yinghua, Ma [1 ]
Khan, Ahmad [1 ]
Heng, Yang [1 ]
Khan, Fiaz Gul [1 ]
Ali, Farman [2 ]
Al-Otaibi, Yasser D. [3 ]
Bashir, Ali Kashif [4 ,5 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
[2] Sungkyunkwan Univ, Coll Comp & Informat, Sch Convergence, Dept Appl AI, Seoul 03063, South Korea
[3] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh, Dept Informat Syst, Jeddah 21589, Saudi Arabia
[4] Manchester Metropolitan Univ, Dept Comp & Math, Manchester, England
[5] Chitkara Univ, Chitkara Univ Inst Engn & Technol, Ctr Res Impact & Outcome, Rajpura 140401, Punjab, India
关键词
Deep learning; Cancer classification; Clustering; Survival analysis; Multi-omics data; Contrastive learning; Cancer analysis; Dimensionality reduction; ARTIFICIAL-INTELLIGENCE; CANCER SUBTYPES; IDENTIFICATION;
D O I
10.1016/j.ins.2024.121864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cancer is a highly complex and fatal disease that affects various human organs. Early and accurate cancer analysis is crucial for timely treatment, prognosis, and understanding of the disease's development. Recent research utilizes deep learning-based models to combine multi-omics data for tasks such as cancer classification, clustering, and survival prediction. However, these models often overlook interactions between different types of data, which leads to suboptimal performance. In this paper, we present a Contrastive Multi-Modal Encoder (CMME) that integrates and maps multi-omics data into a lower-dimensional latent space, enabling the model to better understand relationships between different data types. The challenging distribution and organization of the data into anchors, positive samples, and negative samples encourage the model to learn synergies among different modalities, pay attention to both strong and weak modalities, and avoid biased learning. The performance of the proposed model is evaluated on downstream tasks such as clustering, classification, and survival prediction. The CMME achieved an accuracy of 98.16% and an F1 score of 98.09% in classifying breast cancer subtypes. For clustering tasks across ten cancer types based on TCGA data, the adjusted Rand index reached 0.966. Additionally, survival analysis results highlighted significant differences in survival rates between different cancer subtypes. The comprehensive qualitative and quantitative results demonstrate that the proposed method outperforms existing methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [11] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie -Pier
    Bodein, Antoine
    Perin, Olivier
    Droit, Arnaud
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3735 - 3746
  • [12] Directional integration and pathway enrichment analysis for multi-omics data
    Slobodyanyuk, Mykhaylo
    Bahcheli, Alexander T.
    Klein, Zoe P.
    Bayati, Masroor
    Strug, Lisa J.
    Reimand, Juri
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [13] Towards multi-omics synthetic data integration
    Selvarajoo, Kumar
    Maurer-Stroh, Sebastian
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [14] A cloud solution for multi-omics data integration
    Tordini, Fabio
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 559 - 566
  • [15] Editorial: Integrative multi-modal, multi-omics analytics for the better understanding of metabolic diseases
    Acharjee, Animesh
    Agarwal, Prasoon
    Gkoutos, Georgios V.
    FRONTIERS IN ENDOCRINOLOGY, 2023, 14
  • [16] PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data
    Lemsara, Amina
    Ouadfel, Salima
    Froehlich, Holger
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [17] Multi-modal Contrastive Learning for Healthcare Data Analytics
    Li, Rui
    Gao, Jing
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 120 - 127
  • [18] PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data
    Amina Lemsara
    Salima Ouadfel
    Holger Fröhlich
    BMC Bioinformatics, 21
  • [19] Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration
    Chen, Fangxu
    Peng, Wei
    Dai, Wei
    Wei, Shoulin
    Fu, Xiaodong
    Liu, Li
    Liu, Lijun
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 12 (01)
  • [20] Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses
    Bakker, Olivier B.
    Aguirre-Gamboa, Raul
    Sanna, Serena
    Oosting, Marije
    Smeekens, Sanne P.
    Jaeger, Martin
    Zorro, Maria
    Vosa, Urmo
    Withoff, Sebo
    Netea-Maier, Romana T.
    Koenen, Hans J. P. M.
    Joosten, Irma
    Xavier, Ramnik J.
    Franke, Lude
    Joosten, Leo A. B.
    Kumar, Vinod
    Wijmenga, Cisca
    Netea, Mihai G.
    Li, Yang
    NATURE IMMUNOLOGY, 2018, 19 (07) : 776 - +