RUV-III-NB: normalization of single cell RNA-seq data

被引:5
|
作者
Salim, Agus [1 ,2 ,3 ,4 ,5 ]
Molania, Ramyar [2 ]
Wang, Jianan [2 ,6 ]
De Livera, Alysha [1 ,2 ,4 ,5 ,7 ]
Thijssen, Rachel [8 ]
Speed, Terence P. [2 ,3 ]
机构
[1] Univ Melbourne, Melbourne Sch Populat & Global Hlth, Melbourne, Vic 3053, Australia
[2] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3052, Australia
[3] Univ Melbourne, Sch Math & Stat, Melbourne, Vic 3010, Australia
[4] Baker Heart & Diabet Inst, Melbourne, Vic 3004, Australia
[5] La Trobe Univ, Dept Math & Stat, Bundoora, Vic 3086, Australia
[6] Univ Melbourne, Dept Med Biol, Melbourne, Vic 3010, Australia
[7] RMIT Univ, Sch Sci, Melbourne, Vic 3000, Australia
[8] Walter & Eliza Hall Inst Med Res, Blood Cells & Blood Canc Div, Parkville, Vic 3052, Australia
基金
英国医学研究理事会;
关键词
UNWANTED VARIATION; SEQUENCING DATA; EXPRESSION;
D O I
10.1093/nar/gkac486
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Normalization of single cell RNA-seq data remains a challenging task. The performance of different methods can vary greatly between datasets when unwanted factors and biology are associated. Most normalization methods also only remove the effects of unwanted variation for the cell embedding but not from gene-level data typically used for differential expression (DE) analysis to identify marker genes. We propose RUV-III-NB, a method that can be used to remove unwanted variation from both the cell embedding and gene-level counts. Using pseudo-replicates, RUV-III-NB explicitly takes into account potential association with biology when removing unwanted variation. The method can be used for both UMI or read counts and returns adjusted counts that can be used for downstream analyses such as clustering, DE and pseudotime analyses. Using published datasets with different technological platforms, kinds of biology and levels of association between biology and unwanted variation, we show that RUV-III-NB manages to remove library size and batch effects, strengthen biological signals, improve DE analyses, and lead to results exhibiting greater concordance with independent datasets of the same kind. The performance of RUV-III-NB is consistent and is not sensitive to the number of factors assumed to contribute to the unwanted variation.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] scKWARN: Kernel-weighted-average robust normalization for single-cell RNA-seq data
    Hsu, Chih-Yuan
    Chang, Chia-Jung
    Liu, Qi
    Shyr, Yu
    BIOINFORMATICS, 2024, 40 (02)
  • [22] Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq
    Cole, Michael B.
    Risso, Davide
    Wagner, Allon
    DeTomaso, David
    Ngai, John
    Purdom, Elizabeth
    Dudoit, Sandrine
    Yosef, Nir
    CELL SYSTEMS, 2019, 8 (04) : 315 - +
  • [23] Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data
    Gao, Yipeng
    Li, Wei
    MRNA 3' END PROCESSING AND METABOLISM, 2021, 655 : 225 - 243
  • [24] Computational Cell Cycle Analysis of Single Cell RNA-Seq Data
    Moussa, Marmar
    Mandoiu, Ion I.
    COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES, 2021, 12686 : 71 - 87
  • [25] Comparison of transformations for single-cell RNA-seq data
    Constantin Ahlmann-Eltze
    Wolfgang Huber
    Nature Methods, 2023, 20 : 665 - 672
  • [26] scDIOR: single cell RNA-seq data IO software
    Feng, Huijian
    Lin, Lihui
    Chen, Jiekai
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [27] scDIOR: single cell RNA-seq data IO software
    Huijian Feng
    Lihui Lin
    Jiekai Chen
    BMC Bioinformatics, 23
  • [28] Locality Sensitive Imputation for Single Cell RNA-Seq Data
    Moussa, Marmar
    Mandoiu, Ion I.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (08) : 822 - 835
  • [29] Comparison of transformations for single-cell RNA-seq data
    Ahlmann-Eltze, Constantin
    Huber, Wolfgang
    NATURE METHODS, 2023, 20 (05) : 665 - +
  • [30] Computational cell cycle analysis of single cell RNA-seq data
    Moussa, Marmar
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2018,