DeepPASTA: deep neural network based polyadenylation site analysis

被引:33
作者
Arefeen, Ashraful [1 ]
Xiao, Xinshu [2 ]
Jiang, Tao [1 ,3 ,4 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA 90095 USA
[3] Univ Calif Riverside, Inst Integrat Genome Biol, Riverside, CA 92521 USA
[4] Tsinghua Univ, Dept Comp Sci & Technol, Bioinformat Div, BNRIST, Beijing 100084, Peoples R China
关键词
MESSENGER-RNA POLYADENYLATION; SECONDARY STRUCTURE; DNA; PREDICTION; SEQUENCE; IDENTIFICATION; MECHANISMS; CLEAVAGE; SIGNALS; GENOME;
D O I
10.1093/bioinformatics/btz283
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. Results: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction.
引用
收藏
页码:4577 / 4585
页数:9
相关论文
共 54 条
  • [41] Detection of polyadenylation signals in human DNA sequences
    Tabaska, JE
    Zhang, MQ
    [J]. GENE, 1999, 231 (1-2) : 77 - 86
  • [42] A large-scale analysis of mRNA polyadenylation of human and mouse genes
    Tian, B
    Hu, J
    Zhang, HB
    Lutz, CS
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (01) : 201 - 212
  • [43] Alternative polyadenylation of mRNA precursors
    Tian, Bin
    Manley, James L.
    [J]. NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2017, 18 (01) : 18 - 30
  • [44] Wahle E, 1997, PROG NUCLEIC ACID RE, V57, P41, DOI 10.1016/S0079-6603(08)60277-9
  • [45] 3'-END CLEAVAGE AND POLYADENYLATION OF MESSENGER-RNA PRECURSORS
    WAHLE, E
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-GENE STRUCTURE AND EXPRESSION, 1995, 1261 (02): : 183 - 194
  • [46] Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation
    Weng, Lingjie
    Li, Yi
    Xie, Xiaohui
    Shi, Yongsheng
    [J]. RNA, 2016, 22 (06) : 813 - 821
  • [47] Secondary structure as a functional feature in the downstream region of mammalian polyadenylation signals
    Wu, CX
    Alwine, JC
    [J]. MOLECULAR AND CELLULAR BIOLOGY, 2004, 24 (07) : 2789 - 2796
  • [48] Xia Z, 2018, BIOINFORMATICS
  • [49] Yada T., 1994, TR876 ICOT
  • [50] Biased alternative polyadenylation in human tissues
    Zhang, HB
    Lee, JY
    Tian, B
    [J]. GENOME BIOLOGY, 2005, 6 (12)