Transcriptome diversity is a systematic source of variation in RNA-sequencing data

被引:10
|
作者
Garcia-Nieto, Pablo [1 ]
Wang, Ban [1 ]
Fraser, Hunter B. [1 ]
机构
[1] Stanford Univ, Dept Biol, Stanford, CA 94305 USA
关键词
SEQ DATA; EXPRESSION; NORMALIZATION;
D O I
10.1371/journal.pcbi.1009939
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates. Author summaryAlthough the cells in every individual organism have nearly identical DNA sequences, they differ substantially in their function-for instance, neurons are very different from muscle cells. This is in large part because different genes are transcribed from DNA into RNA, a key step in the process known as gene expression. The measurement of RNA levels is an important tool in studying biology, but is complicated by many potentially confounding factors. To account for this, computational methods can correct for unknown confounders, but these do not provide any information about what these confounders are. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in both gene expression measurements as well as the confounding factors detected by a leading method. This prevalent factor provides a simple explanation for a primary source of variation in gene expression estimates.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] A Streamlined Approach to Pathway Analysis from RNA-Sequencing Data
    Bow, Austin
    METHODS AND PROTOCOLS, 2021, 4 (01)
  • [42] REPAC: analysis of alternative polyadenylation from RNA-sequencing data
    Imada, Eddie L.
    Wilks, Christopher
    Langmead, Ben
    Marchionni, Luigi
    GENOME BIOLOGY, 2023, 24 (01)
  • [43] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [44] Identifying transposon insertions and their effects from RNA-sequencing data
    de Ruiter, Julian R.
    Kas, Sjors M.
    Schut, Eva
    Adams, David J.
    Koudijs, Marco J.
    Wessels, Lodewyk F. A.
    Jonkers, Jos
    NUCLEIC ACIDS RESEARCH, 2017, 45 (12) : 7064 - 7077
  • [45] HCC: RNA-Sequencing in Cirrhosis
    Wang, Haoyu
    Shi, Wenjie
    Lu, Jing
    Liu, Yuan
    Zhou, Wei
    Yu, Zekun
    Qin, Shengying
    Fan, Junwei
    BIOMOLECULES, 2023, 13 (01)
  • [46] A systematic evaluation of single-cell RNA-sequencing imputation methods
    Hou, Wenpin
    Ji, Zhicheng
    Ji, Hongkai
    Hicks, Stephanie C.
    GENOME BIOLOGY, 2020, 21 (01)
  • [47] A systematic evaluation of single-cell RNA-sequencing imputation methods
    Wenpin Hou
    Zhicheng Ji
    Hongkai Ji
    Stephanie C. Hicks
    Genome Biology, 21
  • [48] Detecting and correcting systematic variation in large-scale RNA sequencing data
    Sheng Li
    Paweł P Łabaj
    Paul Zumbo
    Peter Sykacek
    Wei Shi
    Leming Shi
    John Phan
    Po-Yen Wu
    May Wang
    Charles Wang
    Danielle Thierry-Mieg
    Jean Thierry-Mieg
    David P Kreil
    Christopher E Mason
    Nature Biotechnology, 2014, 32 : 888 - 895
  • [49] Detecting and correcting systematic variation in large-scale RNA sequencing data
    Li, Sheng
    Labaj, Pawel P.
    Zumbo, Paul
    Sykacek, Peter
    Shi, Wei
    Shi, Leming
    Phan, John
    Wu, Po-Yen
    Wang, May
    Wang, Charles
    Thierry-Mieg, Danielle
    Thierry-Mieg, Jean
    Kreil, David P.
    Mason, Christopher E.
    NATURE BIOTECHNOLOGY, 2014, 32 (09) : 888 - 895
  • [50] A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing
    Cui, Peng
    Lin, Qiang
    Ding, Feng
    Xin, Chengqi
    Gong, Wei
    Zhang, Lingfang
    Geng, Jianing
    Zhang, Bing
    Yu, Xiaomin
    Yang, Jin
    Hu, Songnian
    Yu, Jun
    GENOMICS, 2010, 96 (05) : 259 - 265