Dictionary learning for integrative, multimodal and scalable single-cell analysis

被引:617
作者
Hao, Yuhan [1 ,2 ]
Stuart, Tim [1 ,2 ]
Kowalski, Madeline H. H. [2 ,3 ]
Choudhary, Saket [1 ,2 ]
Hoffman, Paul [1 ]
Hartman, Austin [1 ]
Srivastava, Avi [1 ,2 ]
Molla, Gesmira [2 ]
Madad, Shaista [1 ,2 ]
Fernandez-Granda, Carlos [4 ,5 ]
Satija, Rahul [1 ,2 ]
机构
[1] NYU, Ctr Genom & Syst Biol, New York, NY 10012 USA
[2] New York Genome Ctr, New York, NY 10013 USA
[3] NYU Langone Med Ctr, Inst Syst Genet, New York, NY USA
[4] NYU, Ctr Data Sci, New York, NY USA
[5] NYU, Courant Inst Math Sci, New York, NY USA
关键词
RNA-SEQ DATA; CHROMATIN ACCESSIBILITY; T-CELLS; K-SVD; MILD; HETEROGENEITY; PROJECTION; ALIGNMENT; RESOLVES; SPARSE;
D O I
10.1038/s41587-023-01767-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce 'bridge integration', a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a 'dictionary', which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit (), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
引用
收藏
页码:293 / 304
页数:22
相关论文
共 121 条
  • [1] Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis
    Adams, Taylor S.
    Schupp, Jonas C.
    Poli, Sergio
    Ayaub, Ehab A.
    Neumark, Nir
    Ahangari, Farida
    Chu, Sarah G.
    Raby, Benjamin A.
    DeTullis, Giuseppe
    Januszyk, Michael
    Duan, Qiaonan
    Arnett, Heather A.
    Siddiqui, Asim
    Washko, George R.
    Homer, Robert
    Yan, Xiting
    Rosas, Ivan O.
    Kaminski, Naftali
    [J]. SCIENCE ADVANCES, 2020, 6 (28)
  • [2] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation
    Aharon, Michal
    Elad, Michael
    Bruckstein, Alfred
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) : 4311 - 4322
  • [3] A blood atlas of COVID-19 defines hallmarks of disease severity and specificity
    Ahern, David J.
    Ai, Zhichao
    Ainsworth, Mark
    Allan, Chris
    Allcock, Alice
    Angus, Brian
    Ansari, M. Azim
    Arancibia-Carcamo, Carolina, V
    Aschenbrenner, Dominik
    Attar, Moustafa
    Baillie, J. Kenneth
    Barnes, Eleanor
    Bashford-Rogers, Rachael
    Bashyal, Archana
    Beer, Sally
    Berridge, Georgina
    Beveridge, Amy
    Bibi, Sagida
    Bicanic, Tihana
    Blackwell, Luke
    Bowness, Paul
    Brent, Andrew
    Brown, Andrew
    Broxholme, John
    Buck, David
    Burnham, Katie L.
    Byrne, Helen
    Camara, Susana
    Ferreira, Ivan Candido
    Charles, Philip
    Chen, Wentao
    Chen, Yi-Ling
    Chong, Amanda
    Clutterbuck, Elizabeth A.
    Coles, Mark
    Conlon, Christopher P.
    Cornall, Richard
    Cribbs, Adam P.
    Curion, Fabiola
    Davenport, Emma E.
    Davidson, Neil
    Davis, Simon
    Dendrou, Calliope A.
    Dequaire, Julie
    Dib, Lea
    Docker, James
    Dold, Christina
    Dong, Tao
    Downes, Damien
    Drakesmith, Hal
    [J]. CELL, 2022, 185 (05) : 916 - +
  • [4] Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans
    Arunachalam, Prabhu S.
    Wimmers, Florian
    Mok, Chris Ka Pun
    Perera, Ranawaka A. P. M.
    Scott, Madeleine
    Hagan, Thomas
    Sigal, Natalia
    Feng, Yupeng
    Bristow, Laurel
    Tsang, Owen Tak-Yin
    Wagh, Dhananjay
    Coller, John
    Pellegrini, Kathryn L.
    Kazmin, Dmitri
    Alaaeddine, Ghina
    Leung, Wai Shing
    Chan, Jacky Man Chun
    Chik, Thomas Shiu Hong
    Choi, Chris Yau Chung
    Huerta, Christopher
    McCullough, Michele Paine
    Lv, Huibin
    Anderson, Evan
    Edupuganti, Srilatha
    Upadhyay, Amit A.
    Bosinger, Steve E.
    Maecker, Holden Terry
    Khatri, Purvesh
    Rouphael, Nadine
    Peiris, Malik
    Pulendran, Bali
    [J]. SCIENCE, 2020, 369 (6508) : 1210 - +
  • [5] Ashuach T., 2021, bioRxiv, V2021, p2020.457057, DOI DOI 10.1101/2021.08.20.457057
  • [6] Comparative cellular analysis of motor cortex in human, marmoset and mouse
    Bakken, Trygve E.
    Jorstad, Nikolas L.
    Hu, Qiwen
    Lake, Blue B.
    Tian, Wei
    Kalmbach, Brian E.
    Crow, Megan
    Hodge, Rebecca D.
    Krienen, Fenna M.
    Sorensen, Staci A.
    Eggermont, Jeroen
    Yao, Zizhen
    Aevermann, Brian D.
    Aldridge, Andrew I.
    Bartlett, Anna
    Bertagnolli, Darren
    Casper, Tamara
    Castanon, Rosa G.
    Crichton, Kirsten
    Daigle, Tanya L.
    Dalley, Rachel
    Dee, Nick
    Dembrow, Nikolai
    Diep, Dinh
    Ding, Song-Lin
    Dong, Weixiu
    Fang, Rongxin
    Fischer, Stephan
    Goldman, Melissa
    Goldy, Jeff
    Graybuck, Lucas T.
    Herb, Brian R.
    Hou, Xiaomeng
    Kancherla, Jayaram
    Kroll, Matthew
    Lathia, Kanan
    van Lew, Baldur
    Li, Yang Eric
    Liu, Christine S.
    Liu, Hanqing
    Lucero, Jacinta D.
    Mahurkar, Anup
    McMillen, Delissa
    Miller, Jeremy A.
    Moussa, Marmar
    Nery, Joseph R.
    Nicovich, Philip R.
    Niu, Sheng-Yong
    Orvis, Joshua
    Osteen, Julia K.
    [J]. NATURE, 2021, 598 (7879) : 111 - +
  • [7] Joint analysis of heterogeneous single-cell RNA-seq dataset collections
    Barkas, Nikolas
    Petukhov, Viktor
    Nikolaeva, Daria
    Lozinsky, Yaroslav
    Demharter, Samuel
    Khodosevich, Konstantin
    Kharchenko, Peter V.
    [J]. NATURE METHODS, 2019, 16 (08) : 695 - +
  • [8] Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds
    Barshan, Elnaz
    Ghodsi, Ali
    Azimifar, Zohreh
    Jahromi, Mansoor Zolghadri
    [J]. PATTERN RECOGNITION, 2011, 44 (07) : 1357 - 1371
  • [9] Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues
    Bartosovic, Marek
    Kabbe, Mukund
    Castelo-Branco, Goncalo
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (07) : 825 - 835
  • [10] Laplacian eigenmaps for dimensionality reduction and data representation
    Belkin, M
    Niyogi, P
    [J]. NEURAL COMPUTATION, 2003, 15 (06) : 1373 - 1396