CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

被引:3
作者
Zhao, Chen [1 ]
Liu, Anqi [2 ]
Zhang, Xiao [2 ]
Cao, Xuewei [3 ]
Ding, Zhengming [4 ]
Sha, Qiuying [3 ]
Shen, Hui [2 ]
Deng, Hong-Wen [2 ,8 ]
Zhou, Weihua [5 ,6 ,7 ]
机构
[1] Kennesaw State Univ, Dept Comp Sci, Marietta, GA 30060 USA
[2] Tulane Univ, Tulane Ctr Biomed Informat & Genom, Deming Dept Med, Div Biomed Informat & Genom, New Orleans, LA 70112 USA
[3] Michigan Technol Univ, Dept Math Sci, 1400 Townsend Dr, Houghton, MI 49931 USA
[4] Tulane Univ, Dept Comp Sci, New Orleans, LA 70118 USA
[5] Michigan Technol Univ, Dept Appl Comp, 1400 Townsend Dr, Houghton, MI 49931 USA
[6] Michigan Technol Univ, Inst Comp & Cybersyst, Ctr Biocomp & Digital Hlth, Houghton, MI 49931 USA
[7] Michigan Technol Univ, Hlth Res Inst, Houghton, MI 49931 USA
[8] Tulane Univ, Tulane Ctr Biomed Informat & Genom, Deming Dept Med, 1440 Canal St,Suite 1619F, New Orleans, LA 70112 USA
基金
美国国家卫生研究院;
关键词
Multi-omics integration; Incomplete omics data; Deep learning; Autoencoders; Contrastive learning;
D O I
10.1016/j.compbiomed.2024.108058
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Integration of heterogeneous and high -dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature -level self -attention and omics-level self -attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.
引用
收藏
页数:13
相关论文
共 58 条
  • [1] Arevalo J, 2017, Arxiv, DOI arXiv:1702.01992
  • [2] MOFA plus : a statistical framework for comprehensive integration of multi-modal single-cell data
    Argelaguet, Ricard
    Arnol, Damien
    Bredikhin, Danila
    Deloro, Yonatan
    Velten, Britta
    Marioni, John C.
    Stegle, Oliver
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [3] Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets
    Argelaguet, Ricard
    Velten, Britta
    Arnol, Damien
    Dietrich, Sascha
    Zenz, Thorsten
    Marioni, John C.
    Buettner, Florian
    Huber, Wolfgang
    Stegle, Oliver
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
  • [4] Bennett DA, 2012, CURR ALZHEIMER RES, V9, P646
  • [5] Bennett DA, 2012, CURR ALZHEIMER RES, V9, P628
  • [6] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [7] Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas
    Brat, Daniel J.
    Verhaak, Roel G. W.
    Al-dape, Kenneth D.
    Yung, W. K. Alfred
    Salama, Sofie R.
    Cooper, Lee A. D.
    Rheinbay, Esther
    Miller, C. Ryan
    Vitucci, Mark
    Morozova, Olena
    Robertson, A. Gordon
    Noushmehr, Houtan
    Laird, Peter W.
    Cherniack, Andrew D.
    Akbani, Rehan
    Huse, Jason T.
    Ciriello, Giovanni
    Poisson, Laila M.
    Barnholtz-Sloan, Jill S.
    Berger, Mitchel S.
    Brennan, Cameron
    Colen, Rivka R.
    Colman, Howard
    Flanders, Adam E.
    Giannini, Caterina
    Grifford, Mia
    Iavarone, Antonio
    Jain, Rajan
    Joseph, Isaac
    Kim, Jaegil
    Kasaian, Katayoon
    Mikkelsen, Tom
    Murray, Bradley A.
    O'Neill, Brian Patrick
    Pachter, Lior
    Parsons, Donald W.
    Sougnez, Carrie
    Sulman, Erik P.
    Vandenberg, Scott R.
    Van Meir, Erwin G.
    von Deimling, Andreas
    Zhang, Hailei
    Crain, Daniel
    Lau, Kevin
    Mallery, David
    Morris, Scott
    Paulauskis, Joseph
    Penny, Robert
    Shelton, Troy
    Sherman, Mark
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (26) : 2481 - 2498
  • [8] Structured Matrix Completion with Applications to Genomic Data Integration
    Cai, Tianxi
    Cai, T. Tony
    Zhang, Anru
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 621 - 633
  • [9] Multi-omics single-cell data integration and regulatory inference with graph-linked embedding
    Cao, Zhi-Jie
    Gao, Ge
    [J]. NATURE BIOTECHNOLOGY, 2022, 40 (10) : 1458 - +
  • [10] Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer
    Chaudharyl, Kumardeep
    Poirionl, Olivier B.
    Lu, Liangqun
    Garmire, Lana X.
    [J]. CLINICAL CANCER RESEARCH, 2018, 24 (06) : 1248 - 1259