EmptyNN: A neural network based on positive and unlabeled learning to remove cell-free droplets and recover lost cells in scRNA-seq data

被引:12
作者
Yan, Fangfang [1 ]
Zhao, Zhongming [1 ,2 ,3 ]
Simon, Lukas M. [4 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Ctr Precis Hlth, Houston, TX 77030 USA
[2] Univ Texas Hlth Sci Ctr Houston, Sch Publ Hlth, Human Genet Ctr, Houston, TX 77030 USA
[3] UTHealth Grad Sch Biomed Sci, MD Anderson Canc Ctr, Houston, TX 77030 USA
[4] Baylor Coll Med, Therapeut Innovat Ctr, Houston, TX 77030 USA
来源
PATTERNS | 2021年 / 2卷 / 08期
基金
美国国家卫生研究院;
关键词
cell-calling algorithm; droplet-based single-cell transcriptomics; DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problems; empty droplets; neural networks; positive-unlabeled learning; single-cell RNA sequencing;
D O I
10.1016/j.patter.2021.100311
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Droplet-based single-cell RNA sequencing (scRNA-seq) has significantly increased the number of cells profiled per experiment and revolutionized the study of individual transcriptomes. However, to maximize the biological signal, robust computational methods are needed to distinguish cell-free from cell-containing droplets. Here, we introduce a novel cell-calling algorithm called EmptyNN, which trains a neural network based on positive-unlabeled learning for improved filtering of barcodes. For benchmarking purposes, we leveraged cell hashing and genetic variation to provide ground truth. EmptyNN accurately removed cell free droplets while recovering lost cell clusters, and achieved an area under the receiver operating characteristics of 94.73% and 96.30%, respectively. Comparisons to current state-of-the-art cell-calling algorithms demonstrated the superior performance of EmptyNN. EmptyNN was further applied to a single-nucleus RNA sequencing (snRNA-seq) dataset and showed good performance. Therefore, EmptyNN represents a powerful tool to enhance both scRNA-seq and snRNA-seq quality control analyses.
引用
收藏
页数:11
相关论文
共 23 条
[1]   Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM [J].
Alvarez, Marcus ;
Rahmani, Elior ;
Jew, Brandon ;
Garske, Kristina M. ;
Miao, Zong ;
Benhammou, Jihane N. ;
Ye, Chun Jimmie ;
Pisegna, Joseph R. ;
Pietilainen, Kirsi H. ;
Halperin, Eran ;
Pajukanta, Paivi .
SCIENTIFIC REPORTS, 2020, 10 (01)
[2]  
Angerer Philipp, 2017, Current Opinion in Systems Biology, V4, P85, DOI 10.1016/j.coisb.2017.07.004
[3]  
[Anonymous], 2002, ICML
[4]  
Babadi M., 2019, BIORXIV, DOI 10.1101/791699
[5]   Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool [J].
Chen, Edward Y. ;
Tan, Christopher M. ;
Kou, Yan ;
Duan, Qiaonan ;
Wang, Zichen ;
Meirelles, Gabriela Vaz ;
Clark, Neil R. ;
Ma'ayan, Avi .
BMC BIOINFORMATICS, 2013, 14
[6]  
De Comité F, 1999, LECT NOTES ARTIF INT, V1720, P219
[7]  
Denis F, 1998, LECT NOTES ARTIF INT, V1501, P112
[8]  
Elkan Charles, 2008, P P 14 ACM SIGKDD IN, P213, DOI DOI 10.1145/1401890.1401920
[9]  
Habib N, 2017, NAT METHODS, V14, P955, DOI [10.1038/NMETH.4407, 10.1038/nmeth.4407]
[10]  
Kaboutari A, 2014, Int. J. Comput. Appl. Technol. Res., V3, P592, DOI 10.7753/IJCATR0309.1012