A deep contrastive multi-modal encoder for multi-omics data integration and analysis

被引:0
作者
Yinghua, Ma [1 ]
Khan, Ahmad [1 ]
Heng, Yang [1 ]
Khan, Fiaz Gul [1 ]
Ali, Farman [2 ]
Al-Otaibi, Yasser D. [3 ]
Bashir, Ali Kashif [4 ,5 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
[2] Sungkyunkwan Univ, Coll Comp & Informat, Sch Convergence, Dept Appl AI, Seoul 03063, South Korea
[3] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh, Dept Informat Syst, Jeddah 21589, Saudi Arabia
[4] Manchester Metropolitan Univ, Dept Comp & Math, Manchester, England
[5] Chitkara Univ, Chitkara Univ Inst Engn & Technol, Ctr Res Impact & Outcome, Rajpura 140401, Punjab, India
关键词
Deep learning; Cancer classification; Clustering; Survival analysis; Multi-omics data; Contrastive learning; Cancer analysis; Dimensionality reduction; ARTIFICIAL-INTELLIGENCE; CANCER SUBTYPES; IDENTIFICATION;
D O I
10.1016/j.ins.2024.121864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cancer is a highly complex and fatal disease that affects various human organs. Early and accurate cancer analysis is crucial for timely treatment, prognosis, and understanding of the disease's development. Recent research utilizes deep learning-based models to combine multi-omics data for tasks such as cancer classification, clustering, and survival prediction. However, these models often overlook interactions between different types of data, which leads to suboptimal performance. In this paper, we present a Contrastive Multi-Modal Encoder (CMME) that integrates and maps multi-omics data into a lower-dimensional latent space, enabling the model to better understand relationships between different data types. The challenging distribution and organization of the data into anchors, positive samples, and negative samples encourage the model to learn synergies among different modalities, pay attention to both strong and weak modalities, and avoid biased learning. The performance of the proposed model is evaluated on downstream tasks such as clustering, classification, and survival prediction. The CMME achieved an accuracy of 98.16% and an F1 score of 98.09% in classifying breast cancer subtypes. For clustering tasks across ten cancer types based on TCGA data, the adjusted Rand index reached 0.966. Additionally, survival analysis results highlighted significant differences in survival rates between different cancer subtypes. The comprehensive qualitative and quantitative results demonstrate that the proposed method outperforms existing methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] HONMF: integration analysis of multi-omics microbiome data via matrix factorization and hypergraph
    Ma, Yuanyuan
    Liu, Lifang
    Ma, Yingjun
    Zhang, Song
    [J]. BIOINFORMATICS, 2023, 39 (06)
  • [42] Multiview clustering of multi-omics data integration by using a penalty model
    Hamas A. AL-kuhali
    Ma Shan
    Mohanned Abduljabbar Hael
    Eman A. Al-Hada
    Shamsan A. Al-Murisi
    Ahmed A. Al-kuhali
    Ammar A. Q. Aldaifl
    Mohammed Elmustafa Amin
    [J]. BMC Bioinformatics, 23
  • [43] Supervised multiple kernel learning approaches for multi-omics data integration
    Briscik, Mitja
    Tazza, Gabriele
    Vidacs, Laszlo
    Dillies, Marie-Agnes
    Dejean, Sebastien
    [J]. BIODATA MINING, 2024, 17 (01):
  • [44] Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers
    Demir Karaman, Ezgi
    Isik, Zerrin
    [J]. MEDICAL SCIENCES, 2023, 11 (03)
  • [45] Integration of multi-omics data for integrative gene regulatory network inference
    Zarayeneh, Neda
    Ko, Euiseong
    Oh, Jung Hun
    Suh, Sang
    Liu, Chunyu
    Gao, Jean
    Kim, Donghyun
    Kang, Mingon
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 18 (03) : 223 - 239
  • [46] Multiview clustering of multi-omics data integration by using a penalty model
    AL-kuhali, Hamas A.
    Shan, Ma
    Hael, Mohanned Abduljabbar
    Al-Hada, Eman A.
    Al-Murisi, Shamsan A.
    Al-kuhali, Ahmed A.
    Aldaifl, Ammar A. Q.
    Amin, Mohammed Elmustafa
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [47] Global and cross-modal feature aggregation for multi-omics data classification and on
    Zheng, Xiao
    Wang, Minhui
    Huang, Kai
    Zhu, En
    [J]. INFORMATION FUSION, 2024, 102
  • [48] Deep learning approaches for multi-modal sensor data analysis and abnormality detection
    Jadhav, Santosh Pandurang
    Srinivas, Angalkuditi
    Dipak Raghunath, Patil
    Ramkumar Prabhu, M.
    Suryawanshi, Jaya
    Haldorai, Anandakumar
    [J]. Measurement: Sensors, 33
  • [49] Contrastive Adversarial Training for Multi-Modal Machine Translation
    Huang, Xin
    Zhang, Jiajun
    Zong, Chengqing
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [50] A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification
    Chung, Ren-Hua
    Kang, Chen-Yu
    [J]. GIGASCIENCE, 2019, 8 (05):