A deep contrastive multi-modal encoder for multi-omics data integration and analysis

被引:0
|
作者
Yinghua, Ma [1 ]
Khan, Ahmad [1 ]
Heng, Yang [1 ]
Khan, Fiaz Gul [1 ]
Ali, Farman [2 ]
Al-Otaibi, Yasser D. [3 ]
Bashir, Ali Kashif [4 ,5 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
[2] Sungkyunkwan Univ, Coll Comp & Informat, Sch Convergence, Dept Appl AI, Seoul 03063, South Korea
[3] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh, Dept Informat Syst, Jeddah 21589, Saudi Arabia
[4] Manchester Metropolitan Univ, Dept Comp & Math, Manchester, England
[5] Chitkara Univ, Chitkara Univ Inst Engn & Technol, Ctr Res Impact & Outcome, Rajpura 140401, Punjab, India
关键词
Deep learning; Cancer classification; Clustering; Survival analysis; Multi-omics data; Contrastive learning; Cancer analysis; Dimensionality reduction; ARTIFICIAL-INTELLIGENCE; CANCER SUBTYPES; IDENTIFICATION;
D O I
10.1016/j.ins.2024.121864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cancer is a highly complex and fatal disease that affects various human organs. Early and accurate cancer analysis is crucial for timely treatment, prognosis, and understanding of the disease's development. Recent research utilizes deep learning-based models to combine multi-omics data for tasks such as cancer classification, clustering, and survival prediction. However, these models often overlook interactions between different types of data, which leads to suboptimal performance. In this paper, we present a Contrastive Multi-Modal Encoder (CMME) that integrates and maps multi-omics data into a lower-dimensional latent space, enabling the model to better understand relationships between different data types. The challenging distribution and organization of the data into anchors, positive samples, and negative samples encourage the model to learn synergies among different modalities, pay attention to both strong and weak modalities, and avoid biased learning. The performance of the proposed model is evaluated on downstream tasks such as clustering, classification, and survival prediction. The CMME achieved an accuracy of 98.16% and an F1 score of 98.09% in classifying breast cancer subtypes. For clustering tasks across ten cancer types based on TCGA data, the adjusted Rand index reached 0.966. Additionally, survival analysis results highlighted significant differences in survival rates between different cancer subtypes. The comprehensive qualitative and quantitative results demonstrate that the proposed method outperforms existing methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Survey on Multi-omics, and Multi-omics Data Analysis, Integration and Application
    Shahrajabian, Mohamad Hesam
    Sun, Wenli
    CURRENT PHARMACEUTICAL ANALYSIS, 2023, 19 (04) : 267 - 281
  • [2] Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets
    Argelaguet, Ricard
    Velten, Britta
    Arnol, Damien
    Dietrich, Sascha
    Zenz, Thorsten
    Marioni, John C.
    Buettner, Florian
    Huber, Wolfgang
    Stegle, Oliver
    MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
  • [3] A roadmap for multi-omics data integration using deep learning
    Kang, Mingon
    Ko, Euiseong
    Mersha, Tesfaye B.
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [4] Deep learning-based approaches for multi-omics data integration and analysis
    Ballard, Jenna L.
    Wang, Zexuan
    Li, Wenrui
    Shen, Li
    Long, Qi
    BIODATA MINING, 2024, 17 (01):
  • [5] Review on Integration Analysis and Application of Multi-omics Data
    Zhong, Yating
    Lin, Yanmei
    Chen, Dingjia
    Peng, Yuzhong
    Zeng, Yuanpeng
    Computer Engineering and Applications, 2024, 57 (23) : 1 - 17
  • [6] Deep contrastive representation learning for multi-modal clustering
    Lu, Yang
    Li, Qin
    Zhang, Xiangdong
    Gao, Quanxue
    NEUROCOMPUTING, 2024, 581
  • [7] Interpretable multi-modal data integration
    Osorio, Daniel
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (01): : 8 - 9
  • [8] Integration strategies of multi-omics data for machine learning analysis
    Picard M.
    Scott-Boyer M.-P.
    Bodein A.
    Périn O.
    Droit A.
    Computational and Structural Biotechnology Journal, 2021, 19 : 3735 - 3746
  • [9] Interpretable multi-modal data integration
    Daniel Osorio
    Nature Computational Science, 2022, 2 : 8 - 9
  • [10] MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation
    Madrid-Marquez, Laura
    Rubio-Escudero, Cristina
    Pontes, Beatriz
    Gonzalez-Perez, Antonio
    Riquelme, Jose C.
    Saez, Maria E.
    APPLIED SCIENCES-BASEL, 2022, 12 (08):