Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis

被引:425
作者
Li, Xiangjie [1 ,2 ,3 ]
Wang, Kui [1 ,4 ,5 ]
Lyu, Yafei [1 ]
Pan, Huize [6 ]
Zhang, Jingxiao [2 ]
Stambolian, Dwight [7 ]
Susztak, Katalin [8 ]
Reilly, Muredach P. [6 ]
Hu, Gang [1 ,9 ]
Li, Mingyao [1 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[2] Renmin Univ China, Ctr Appl Stat, Sch Stat, Beijing 100872, Peoples R China
[3] Chinese Acad Med Sci & Peking Union Med Coll, Fuwai Hosp, Natl Ctr Cardiovasc Dis, State Key Lab Cardiovasc Dis, Beijing 100037, Peoples R China
[4] Nankai Univ, Sch Math Sci, Dept Informat Theory & Data Sci, Tianjin 300071, Peoples R China
[5] Nankai Univ, LPMC, Tianjin 300071, Peoples R China
[6] Columbia Univ, Dept Med, Div Cardiol, Med Ctr, New York, NY 10032 USA
[7] Univ Penn, Perelman Sch Med, Dept Ophthalmol, Philadelphia, PA 19104 USA
[8] Univ Penn, Perelman Sch Med, Dept Med & Genet, Philadelphia, PA 19104 USA
[9] Nankai Univ, Sch Stat & Data Sci, Key Lab Med Data Anal & Stat Res Tianjin, Tianjin 300071, Peoples R China
关键词
D O I
10.1038/s41467-020-15851-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. We present DESC, an unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function. Through iterative self-learning, DESC gradually removes batch effects, as long as technical differences across batches are smaller than true biological variations. As a soft clustering algorithm, cluster assignment probabilities from DESC are biologically interpretable and can reveal both discrete and pseudotemporal structure of cells. Comprehensive evaluations show that DESC offers a proper balance of clustering accuracy and stability, has a small footprint on memory, does not explicitly require batch information for batch effect removal, and can utilize GPU when available. As the scale of single-cell studies continues to grow, we believe DESC will offer a valuable tool for biomedical researchers to disentangle complex cellular heterogeneity. Increasingly large scRNA-seq datasets demand better and more scalable analysis tools. Here, the authors introduce a scalable unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function and enables removal of batch effects.
引用
收藏
页数:14
相关论文
共 25 条
[1]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[2]   Integrating single-cell transcriptomic data across different conditions, technologies, and species [J].
Butler, Andrew ;
Hoffman, Paul ;
Smibert, Peter ;
Papalexi, Efthymia ;
Satija, Rahul .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :411-+
[3]   The single-cell transcriptional landscape of mammalian organogenesis [J].
Cao, Junyue ;
Spielmann, Malte ;
Qiu, Xiaojie ;
Huang, Xingfan ;
Ibrahim, Daniel M. ;
Hill, Andrew J. ;
Zhang, Fan ;
Mundlos, Stefan ;
Christiansen, Lena ;
Steemers, Frank J. ;
Trapnell, Cole ;
Shendure, Jay .
NATURE, 2019, 566 (7745) :496-+
[4]   De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data [J].
Grun, Dominic ;
Muraro, Mauro J. ;
Boisset, Jean-Charles ;
Wiebrands, Kay ;
Lyubimova, Anna ;
Dharmadhikari, Gitanjali ;
van den Born, Maaike ;
van Es, Johan ;
Jansen, Erik ;
Clevers, Hans ;
de Koning, Eelco J. P. ;
van Oudenaarden, Alexander .
CELL STEM CELL, 2016, 19 (02) :266-277
[5]   Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors [J].
Haghverdi, Laleh ;
Lun, Aaron T. L. ;
Morgan, Michael D. ;
Marioni, John C. .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :421-+
[6]   Interferon-Beta Induces Distinct Gene Expression Response Patterns in Human Monocytes versus T cells [J].
Henig, Noa ;
Avidan, Nili ;
Mandel, Ilana ;
Staun-Ram, Elsebeth ;
Ginzburg, Elizabeta ;
Paperna, Tamar ;
Pinter, Ron Y. ;
Miller, Ariel .
PLOS ONE, 2013, 8 (04)
[7]   Missing data and technical variability in single-cell RNA-sequencing experiments [J].
Hicks, Stephanie C. ;
Townes, F. William ;
Teng, Mingxiang ;
Irizarry, Rafael A. .
BIOSTATISTICS, 2018, 19 (04) :562-578
[8]   Efficient integration of heterogeneous single-cell transcriptomes using Scanorama [J].
Hie, Brian ;
Bryson, Bryan ;
Berger, Bonnie .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :685-+
[9]   Multiplexed droplet single-cell RNA-sequencing using natural genetic variation [J].
Kang, Hyun Min ;
Subramaniam, Meena ;
Targ, Sasha ;
Michelle Nguyen ;
Maliskova, Lenka ;
McCarthy, Elizabeth ;
Wan, Eunice ;
Wong, Simon ;
Byrnes, Lauren ;
Lanata, Cristina M. ;
Gate, Rachel E. ;
Mostafavi, Sara ;
Marson, Alexander ;
Zaitlen, Noah ;
Criswell, Lindsey A. ;
Ye, Chun Jimmie .
NATURE BIOTECHNOLOGY, 2018, 36 (01) :89-+
[10]  
Kiselev VY, 2017, NAT METHODS, V14, P483, DOI [10.1038/NMETH.4236, 10.1038/nmeth.4236]