Transcriptome diversity is a systematic source of variation in RNA-sequencing data

被引:10
|
作者
Garcia-Nieto, Pablo [1 ]
Wang, Ban [1 ]
Fraser, Hunter B. [1 ]
机构
[1] Stanford Univ, Dept Biol, Stanford, CA 94305 USA
关键词
SEQ DATA; EXPRESSION; NORMALIZATION;
D O I
10.1371/journal.pcbi.1009939
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates. Author summaryAlthough the cells in every individual organism have nearly identical DNA sequences, they differ substantially in their function-for instance, neurons are very different from muscle cells. This is in large part because different genes are transcribed from DNA into RNA, a key step in the process known as gene expression. The measurement of RNA levels is an important tool in studying biology, but is complicated by many potentially confounding factors. To account for this, computational methods can correct for unknown confounders, but these do not provide any information about what these confounders are. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in both gene expression measurements as well as the confounding factors detected by a leading method. This prevalent factor provides a simple explanation for a primary source of variation in gene expression estimates.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals
    Battle, Alexis
    Mostafavi, Sara
    Zhu, Xiaowei
    Potash, James B.
    Weissman, Myrna M.
    McCormick, Courtney
    Haudenschild, Christian D.
    Beckman, Kenneth B.
    Shi, Jianxin
    Mei, Rui
    Urban, Alexander E.
    Montgomery, Stephen B.
    Levinson, Douglas F.
    Koller, Daphne
    GENOME RESEARCH, 2014, 24 (01) : 14 - 24
  • [2] Nonparametric clustering of RNA-sequencing data
    Lozano, Gabriel
    Atallah, Nadia
    Levine, Michael
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 547 - 559
  • [3] ESOPHAGEAL TRANSCRIPTOME IN EOSINOPHILIC ESOPHAGITIS: A META-ANALYSIS OF BULK RNA-SEQUENCING DATA
    Jacobse, Justin
    Brown, Rachel E.
    Tyree, Regina N.
    Vaezi, Michael F.
    Williams, Christopher S.
    Higginbotham, Tina
    Goettel, Jeremy A.
    Hiremath, Girish
    Choksi, Yash A.
    GASTROENTEROLOGY, 2023, 164 (06) : S367 - S367
  • [4] Concordance between RNA-sequencing data and DNA microarray data in transcriptome analysis of proliferative and quiescent fibroblasts
    Trost, Brett
    Moir, Catherine A.
    Gillespie, Zoe E.
    Kusalik, Anthony
    Mitchell, Jennifer A.
    Eskiw, Christopher H.
    ROYAL SOCIETY OPEN SCIENCE, 2015, 2 (09):
  • [5] Bias detection and correction in RNA-Sequencing data
    Wei Zheng
    Lisa M Chung
    Hongyu Zhao
    BMC Bioinformatics, 12
  • [6] Bias detection and correction in RNA-Sequencing data
    Zheng, Wei
    Chung, Lisa M.
    Zhao, Hongyu
    BMC BIOINFORMATICS, 2011, 12
  • [7] Expression variation analysis for tumor heterogeneity in single-cell RNA-sequencing data
    Davis-Marcisak, Emily F.
    Orugunta, Pranay
    Stein-O'Brien, Genevieve
    Puram, Sidharth V.
    Torres, Evanthia Roussos
    Hopkins, Alexander
    Jaffee, Elizabeth M.
    Favorov, Alexander V.
    Afsari, Bahman
    Goff, Loyal A.
    Fertig, Elana J.
    CANCER RESEARCH, 2019, 79 (13)
  • [8] The transcriptome analysis of the Arabidopsis thaliana in response to the Vibrio vulnificus by RNA-sequencing
    Park, Yong-Soon
    Kim, Seon-Kyu
    Kim, Seon-Young
    Kim, Kyung Mo
    Ryu, Choong-Min
    PLOS ONE, 2019, 14 (12):
  • [9] Improving transcriptome analysis by incorporating unique molecular identifiers into RNA-sequencing
    Posfai, D.
    Krishnan, K.
    Song, C.
    Liu, P.
    Naishadham, G.
    Langhorst, B. W.
    Dimalanta, E. T.
    Davis, T. B.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 611 - 612
  • [10] Transcriptome Analysis of Ceriops tagal in Saline Environments Using RNA-Sequencing
    Xiao, Xiaorong
    Hong, Yuhui
    Xia, Wei
    Feng, Shipeng
    Zhou, Xi
    Fu, Xiumei
    Zang, Jian
    Xiao, Yong
    Niu, Xiaolei
    Li, Chunxia
    Chen, Yinhua
    PLOS ONE, 2016, 11 (12):