Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq

被引:76
|
作者
Wu, Zhengpeng
Wang, Xi
Zhang, Xuegong [1 ]
机构
[1] Tsinghua Univ, TNLIST Dept Automat, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
MESSENGER-RNA; TRANSCRIPTOME; DISEASE; PARKIN; CHIP;
D O I
10.1093/bioinformatics/btq696
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: RNA-Seq technology based on next-generation sequencing provides the unprecedented ability of studying transcriptomes at high resolution and accuracy, and the potential of measuring expression of multiple isoforms from the same gene at high precision. Solved by maximum likelihood estimation, isoform expression can be inferred in RNA-Seq using statistical models based on the assumption that sequenced reads are distributed uniformly along transcripts. Modification of the model is needed when considering situations where RNA-Seq data do not follow uniform distribution. Results: We proposed two curves, the global bias curve (GBC) and the local bias curves (LBCs), to describe the non-uniformity of read distributions for all genes in a transcriptome and for each gene, respectively. Incorporating the bias curves into the uniform read distribution (URD) model, we introduced non-URD (N-URD) models to infer isoform expression levels. On a series of systematic simulation studies, the proposed models outperform the original model in recovering major isoforms and the expression ratio of alternative isoforms. We also applied the new model to real RNA-Seq datasets and found that its inferences on expression ratios of alternative isoforms are more reasonable. The experiments indicate that incorporating N-URD information can improve the accuracy in modeling and inferring isoform expression in RNA-Seq.
引用
收藏
页码:502 / 508
页数:7
相关论文
共 46 条
  • [21] Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
    Bai, Yongsheng
    Kinne, Jeff
    Donham, Brandon
    Jiang, Feng
    Ding, Lizhong
    Hassler, Justin R.
    Kaufman, Randal J.
    BMC GENOMICS, 2016, 17
  • [22] Transcriptome profiling analysis of tea plant (Camellia sinensis) using Oxford Nanopore long-read RNA-Seq technology
    Wang, Fen
    Chen, Zhi
    Pei, Huimin
    Guo, Zhiyou
    Wen, Di
    Liu, Rong
    Song, Baoxing
    GENE, 2021, 769
  • [23] Generating bulk RNA-Seq gene expression data based on generative deep learning models and utilizing it for data augmentation
    Wang, Yinglun
    Chen, Qiurui
    Shao, Hongwei
    Zhang, Rongxin
    Shen, Han
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [24] Long non-coding and coding RNA profiling using strand-specific RNA-seq in human hypertrophic cardiomyopathy
    Liu, Xuanyu
    Ma, Yi
    Yin, Kunlun
    Li, Wenke
    Chen, Wen
    Zhang, Yujing
    Zhu, Changsheng
    Li, Tianjiao
    Han, Bianmei
    Liu, Xuewen
    Wang, Shuiyun
    Zhou, Zhou
    SCIENTIFIC DATA, 2019, 6 (1)
  • [25] Differential gene expression analyses related to fruit yield of Jatropha curcas L. using RNA-seq
    Hui, Wenkai
    Yang, Yuantong
    Wu, Guojiang
    Wang, Yi
    Zayedad, Mohamed Zaky
    Chen, Xiaoyang
    BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2018, 32 (05) : 1126 - 1133
  • [26] Discovering Single Nucleotide Polymorphisms Regulating Human Gene Expression Using Allele Specific Expression from RNA-seq Data
    Kang, Eun Yong
    Martin, Lisa J.
    Mangul, Serghei
    Isvilanonda, Warin
    Zou, Jennifer
    Ben-David, Eyal
    Han, Buhm
    Lusis, Aldons J.
    Shifman, Sagiv
    Eskin, Eleazar
    GENETICS, 2016, 204 (03) : 1057 - +
  • [27] Analysis of gene expression changes in peach leaves in response to Plum pox virus infection using RNA-Seq
    Rubio, Manuel
    Rodriguez-Moreno, Luis
    Rosa Ballester, Ana
    Castro de Moura, Manuel
    Bonghi, Claudio
    Candresse, Thierry
    Martinez-Gomez, Pedro
    MOLECULAR PLANT PATHOLOGY, 2015, 16 (02) : 164 - 176
  • [28] Extensive Variation in Gene Expression is Revealed in 13 Fertility-Related Genes Using RNA-Seq, ISO-Seq, and CAGE-Seq From Brahman Cattle
    Ross, Elizabeth M.
    Sanjana, Hari
    Nguyen, Loan T.
    Cheng, YuanYuan
    Moore, Stephen S.
    Hayes, Ben J.
    FRONTIERS IN GENETICS, 2022, 13
  • [29] Assessing models of speciation under different biogeographic scenarios; an empirical study using multi-locus and RNA-seq analyses
    Edwards, Taylor
    Tollis, Marc
    Hsieh, PingHsun
    Gutenkunst, Ryan N.
    Liu, Zhen
    Kusumi, Kenro
    Culver, Melanie
    Murphy, Robert W.
    ECOLOGY AND EVOLUTION, 2016, 6 (02): : 379 - 396
  • [30] Characterization and analysis of the transcriptome in Gymnocypris selincuoensis on the Qinghai-Tibetan Plateau using single-molecule long-read sequencing and RNA-seq
    Feng, Xiu
    Jia, Yintao
    Zhu, Ren
    Chen, Kang
    Chen, Yifeng
    DNA RESEARCH, 2019, 26 (04) : 353 - 363