CMF-Impute: an accurate imputation tool for single-cell RNA-seq data

被引:85
作者
Xu, Junlin [1 ]
Cai, Lijun [1 ]
Liao, Bo [2 ]
Zhu, Wen [2 ]
Yang, JiaLiang [2 ,3 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Hainan, Peoples R China
[3] Geneis Beijing Co Ltd, Beijing 100102, Peoples R China
关键词
MISSING VALUE ESTIMATION; LINEAGE;
D O I
10.1093/bioinformatics/btaa109
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single-cell RNA-sequencing (scRNA-seq) technology provides a powerful tool for investigating cell heterogeneity and cell subpopulations by allowing the quantification of gene expression at single-cell level. However, scRNA-seq data analysis remains challenging because of various technical noises such as dropout events (i.e. excessive zero counts in the expression matrix). Results: By taking consideration of the association among cells and genes, we propose a novel collaborative matrix factorization-based method called CMF-Impute to impute the dropout entries of a given scRNA-seq expression matrix. We test CMF-Impute and compare it with the other five state-of-the-art methods on six popular real scRNA-seq datasets of various sizes and three simulated datasets. For simulated datasets, CMF-Impute outperforms other methods in imputing the closest dropouts to the original expression values as evaluated by both the sum of squared error and Pearson correlation coefficient. For real datasets, CMF-Impute achieves the most accurate cell classification results in spite of the choice of different clustering methods like SC3 or T-SNE followed by K-means as evaluated by both adjusted rand index and normalized mutual information. Finally, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation, and in inferring cell lineage trajectories.
引用
收藏
页码:3139 / 3147
页数:9
相关论文
共 35 条
[1]   The Power of Convex Relaxation: Near-Optimal Matrix Completion [J].
Candes, Emmanuel J. ;
Tao, Terence .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (05) :2053-2080
[2]   VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies [J].
Chen, Mengjie ;
Zhou, Xiang .
GENOME BIOLOGY, 2018, 19
[3]   Reconstruction of the Mouse Otocyst and Early Neuroblast Lineage at Single-Cell Resolution [J].
Durruthy-Durruthy, Robert ;
Gottlieb, Assaf ;
Hartman, Byron H. ;
Waldhaus, Joerg ;
Laske, Roman D. ;
Altman, Russ ;
Heller, Stefan .
CELL, 2014, 157 (04) :964-978
[4]   DrImpute: imputing dropout events in single cell RNA sequencing data [J].
Gong, Wuming ;
Kwak, Il-Youp ;
Pota, Pruthvi ;
Koyano-Nakagawa, Naoko ;
Garry, Daniel J. .
BMC BIOINFORMATICS, 2018, 19
[5]  
Huang M., 2017, 138677 BIORXIV
[6]  
Kim J, 2019, METHODS MOL BIOL, V1919, P145, DOI 10.1007/978-1-4939-9007-8_11
[7]   Impact of similarity metrics on single-cell RNA-seq data clustering [J].
Kim, Taiyun ;
Chen, Irene Rui ;
Lin, Yingxin ;
Wang, Andy Yi-Yang ;
Yang, Jean Yee Hwa ;
Yang, Pengyi .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) :2316-2326
[8]  
Kiselev VY, 2017, NAT METHODS, V14, P483, DOI [10.1038/nmeth.4236, 10.1038/NMETH.4236]
[9]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791
[10]   An accurate and robust imputation method scImpute for single-cell RNA-seq data [J].
Li, Wei Vivian ;
Li, Jingyi Jessica .
NATURE COMMUNICATIONS, 2018, 9