RIA: a novel Regression-based Imputation Approach for single-cell RNA sequencing

被引:6
作者
Bang Tran [1 ]
Duc Tran [1 ]
Hung Nguyen [1 ]
Nam Sy Vo [2 ]
Tin Nguyen [1 ]
机构
[1] Univ Nevada, Comp Sci & Engn, Reno, NV 89557 USA
[2] Vingrp Big Data Inst, Computat Biomed, Hanoi, Vietnam
来源
PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019) | 2019年
基金
美国国家航空航天局;
关键词
single cell; scRNA-seq; imputation; sequencing; GENE-EXPRESSION; HETEROGENEITY; EMBRYOS; FATE;
D O I
10.1109/kse.2019.8919334
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Advances in single-cell technologies have shifted genomics research from the analysis of bulk tissues toward a comprehensive characterization of individual cells. This holds enormous opportunities for both basic biology and clinical research. As such, identification and characterization of short-lived progenitors, stem cells, cancer stem cells, or circulating tumor cells are essential to better understand both normal and diseased tissue biology. However, quantifying gene expression in each cell remains a significant challenge due to the low amount of mRNA available within individual cells. This leads to the excess amount of zero counts caused by dropout events. Here we introduce RIA, a regression-based approach, that is able to reliably recover the missing values in single-cell data and thus can effectively improve the performance of downstream analyses. We compare RIA with state-of-the-art methods using five scRNA-seq datasets with a total of 3,535 cells. In each dataset analyzed, RIA outperforms existing approaches in improving the identification of cell populations while preserving the biological landscape. We also demonstrate that RIA is able to infer temporal trajectories of embryonic development stages.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 51 条
[1]  
Azizi Elham., 2017, Genomics and Computational Biology, V3, P46, DOI DOI 10.18547/gcb.2017.vol3.iss1.e46
[2]   Augmented implicitly restarted Lanczos bidiagonalization methods [J].
Baglama, J ;
Reichel, L .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2005, 27 (01) :19-42
[3]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[4]   Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels [J].
Bengtsson, M ;
Ståhlberg, A ;
Rorsman, P ;
Kubista, M .
GENOME RESEARCH, 2005, 15 (10) :1388-1392
[5]   Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing [J].
Blase, Fernando H. ;
Cao, Xiaoyi ;
Zhong, Sheng .
GENOME RESEARCH, 2014, 24 (11) :1787-1796
[6]   KERNEL DENSITY ESTIMATION VIA DIFFUSION [J].
Botev, Z. I. ;
Grotowski, J. F. ;
Kroese, D. P. .
ANNALS OF STATISTICS, 2010, 38 (05) :2916-2957
[7]  
Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
[8]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[9]   Functional analysis tools for post-translational modification: a post-translational modification database for analysis of proteins and metabolic pathways [J].
Cruz, Edward R. ;
Nguyen, Hung ;
Nguyen, Tin ;
Wallace, Ian S. .
PLANT JOURNAL, 2019, 99 (05) :1003-1013
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38