A bi-Poisson model for clustering gene expression profiles by RNA-seq

被引:6
|
作者
Wang, Ningtao [1 ]
Wang, Yaqun [1 ]
Hao, Han [1 ]
Wang, Luojun [1 ]
Wang, Zhong
Wang, Jianxin [2 ]
Wu, Rongling [1 ,3 ,4 ]
机构
[1] Penn State Univ, Hershey, PA 17033 USA
[2] Beijing Forestry Univ, Beijing, Peoples R China
[3] Penn State Univ, Ctr Stat Genet, Hershey, PA 17033 USA
[4] Beijing Forestry Univ, Ctr Computat Biol, Beijing, Peoples R China
关键词
RNA-seq; Poisson distribution; EM algorithm; breast cancer cell lines; DIFFERENTIAL EXPRESSION; TRANSCRIPTION FACTORS; DYNAMICS;
D O I
10.1093/bib/bbt029
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. We performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating our understanding of gene functions and networks.
引用
收藏
页码:534 / 541
页数:8
相关论文
共 50 条
  • [31] Impact of human gene annotations on RNA-seq differential expression analysis
    Yu Hamaguchi
    Chao Zeng
    Michiaki Hamada
    BMC Genomics, 22
  • [32] Impact of human gene annotations on RNA-seq differential expression analysis
    Hamaguchi, Yu
    Zeng, Chao
    Hamada, Michiaki
    BMC GENOMICS, 2021, 22 (01)
  • [33] Trimming of sequence reads alters RNA-Seq gene expression estimates
    Claire R. Williams
    Alyssa Baccarella
    Jay Z. Parrish
    Charles C. Kim
    BMC Bioinformatics, 17
  • [34] A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
    Srivastava, Sudeep
    Chen, Liang
    NUCLEIC ACIDS RESEARCH, 2010, 38 (17) : e170 - e170
  • [35] Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model
    Liu, Siyun
    Jiang, Yuan
    Yu, Tao
    GENETIC EPIDEMIOLOGY, 2019, 43 (07) : 786 - 799
  • [36] A new local covariance matrix estimation for the classification of gene expression profiles in high dimensional RNA-Seq data
    Kochan, Necla
    Tutuncu, G. Yazgi
    Giner, Goknur
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
  • [37] PUseqClust: A Clustering Analysis Method for RNA-Seq Data
    Shi X.-F.
    Liu X.-J.
    Zhang L.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (09): : 2857 - 2868
  • [38] Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial
    Wolf, Jochen B. W.
    MOLECULAR ECOLOGY RESOURCES, 2013, 13 (04) : 559 - 572
  • [39] DEB: A web interface for RNA-seq digital gene expression analysis
    Yao, Ji Qiang
    Yu, Fahong
    BIOINFORMATION, 2011, 7 (01) : 44 - 45
  • [40] Comparison of Gene Selection Methods for Clustering Single-cell RNA-seq Data
    Zhu, Xiaoshu
    Wang, Jianxin
    Li, Rongruan
    Peng, Xiaoqing
    CURRENT BIOINFORMATICS, 2023, 18 (01) : 1 - 11