Joint variational autoencoders for multimodal imputation and embedding

被引:18
作者
Kalafut, Noah Cohen [1 ,2 ]
Huang, Xiang [2 ]
Wang, Daifeng [1 ,2 ,3 ]
机构
[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53706 USA
[2] Univ Wisconsin Madison, Waisman Ctr, Madison, WI 53706 USA
[3] Univ Wisconsin Madison, Dept Biostat & Med Informat, Madison, WI 53706 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
35;
D O I
10.1038/s42256-023-00663-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While single-cell multimodal datasets allow for the measurement of individual cells to understand cellular and molecular mechanisms, generating multimodal data for many cells is costly and challenging. Cohen Kalafut and colleagues develop a machine learning model capable of imputing single-cell modalities and prioritizing multimodal features, such as gene expression, chromatin accessibility and electrophysiology. Single-cell multimodal datasets have measured various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, multimodal data generation remains costly and challenging, and missing modalities happen frequently. Recently, machine learning approaches have been developed for data imputation but typically require fully matched multimodalities to learn common latent embeddings that potentially lack modality specificity. To address these issues, we developed an open-source machine learning model, Joint Variational Autoencoders for multimodal Imputation and Embedding (JAMIE). JAMIE takes single-cell multimodal data that can have partially matched samples across modalities. Variational autoencoders learn the latent embeddings of each modality. Then, embeddings from matched samples across modalities are aggregated to identify joint cross-modal latent embeddings before reconstruction. To perform cross-modal imputation, the latent embeddings of one modality can be used with the decoder of the other modality. For interpretability, Shapley values are used to prioritize input features for cross-modal imputation and known sample labels. We applied JAMIE to both simulation data and emerging single-cell multimodal data including gene expression, chromatin accessibility, and electrophysiology in human and mouse brains. JAMIE significantly outperforms existing state-of-the-art methods in general and prioritized multimodal features for imputation, providing potentially novel mechanistic insights at cellular resolution.
引用
收藏
页码:631 / +
页数:15
相关论文
共 34 条
[11]  
Gala R., 2019, Advances in Neural Information Processing Systems, P9263
[12]   Consistent cross-modal identification of cortical neurons with coupled autoencoders [J].
Gala, Rohan ;
Budzillo, Agata ;
Baftizadeh, Fahimeh ;
Miller, Jeremy ;
Gouwens, Nathan ;
Arkhipov, Anton ;
Murphy, Gabe ;
Tasic, Bosiljka ;
Zeng, Hongkui ;
Hawrylycz, Michael ;
Sumbul, Uygar .
NATURE COMPUTATIONAL SCIENCE, 2021, 1 (02) :120-+
[13]   Integrated Morphoelectric and Transcriptomic Classification of Cortical GABAergic Cells [J].
Gouwens, Nathan W. ;
Sorensen, Staci A. ;
Baftizadeh, Fahimeh ;
Budzillo, Agata ;
Lee, Brian R. ;
Jarsky, Tim ;
Alfiler, Lauren ;
Baker, Katherine ;
Barkan, Eliza ;
Berry, Kyla ;
Bertagnolli, Darren ;
Bickley, Kris ;
Bomben, Jasmine ;
Braun, Thomas ;
Brouner, Krissy ;
Casper, Tamara ;
Crichton, Kirsten ;
Daigle, Tanya L. ;
Dalley, Rachel ;
de Frates, Rebecca A. ;
Dee, Nick ;
Desta, Tsega ;
Lee, Samuel Dingman ;
Dotson, Nadezhda ;
Egdorf, Tom ;
Ellingwood, Lauren ;
Enstrom, Rachel ;
Esposito, Luke ;
Farrell, Colin ;
Feng, David ;
Fong, Olivia ;
Gala, Rohan ;
Gamlin, Clare ;
Gary, Amanda ;
Glandon, Alexandra ;
Goldy, Jeff ;
Gorham, Melissa ;
Graybuck, Lucas ;
Gu, Hong ;
Hadley, Kristen ;
Hawrylycz, Michael J. ;
Henry, Alex M. ;
Hill, DiJon ;
Hupp, Madie ;
Kebede, Sara ;
Kim, Tae Kyung ;
Kim, Lisa ;
Kroll, Matthew ;
Lee, Changkyu ;
Link, Katherine E. .
CELL, 2020, 183 (04) :935-+
[14]   BBC3 (PUMA) regulates developmental apoptosis but not axonal injury induced death in the retina [J].
Harder, Jeffrey M. ;
Libby, Richard T. .
MOLECULAR NEURODEGENERATION, 2011, 6
[15]   Relations between two sets of variates [J].
Hotelling, H .
BIOMETRIKA, 1936, 28 :321-377
[16]   scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data [J].
Johansen, Nelson ;
Quon, Gerald .
GENOME BIOLOGY, 2019, 20 (01)
[17]   scAEGAN: Unification of single-cell genomics data by adversarial learning of latent space correspondences [J].
Khan, Sumeer Ahmad ;
Lehmann, Robert ;
Martinez-de-Morentin, Xabier ;
Maillo, Alberto ;
Lagani, Vincenzo ;
Kiani, Narsis A. ;
Gomez-Cabrero, David ;
Tegner, Jesper .
PLOS ONE, 2023, 18 (02)
[18]  
Kingma D.P, 2014, Auto-encoding variational bayes
[19]  
Li Haochen, 2022, bioRxiv, DOI 10.15.512320.2022a
[20]  
Liu Jie, 2019, Algorithms Bioinform, V143, DOI 10.4230/LIPIcs.WABI.2019.10