DeepPASTA: deep neural network based polyadenylation site analysis

被引:33
作者
Arefeen, Ashraful [1 ]
Xiao, Xinshu [2 ]
Jiang, Tao [1 ,3 ,4 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA 90095 USA
[3] Univ Calif Riverside, Inst Integrat Genome Biol, Riverside, CA 92521 USA
[4] Tsinghua Univ, Dept Comp Sci & Technol, Bioinformat Div, BNRIST, Beijing 100084, Peoples R China
关键词
MESSENGER-RNA POLYADENYLATION; SECONDARY STRUCTURE; DNA; PREDICTION; SEQUENCE; IDENTIFICATION; MECHANISMS; CLEAVAGE; SIGNALS; GENOME;
D O I
10.1093/bioinformatics/btz283
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. Results: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction.
引用
收藏
页码:4577 / 4585
页数:9
相关论文
共 54 条
  • [1] POLYAR, a new computer program for prediction of poly(A) sites in human sequences
    Akhtar, Malik Nadeem
    Bukhari, Syed Abbas
    Fazal, Zeeshan
    Qamar, Raheel
    Shahmuradov, Ilham A.
    [J]. BMC GENOMICS, 2010, 11
  • [2] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [3] Aly Mohamed, 2005, TECHNICAL REPORT
  • [4] DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning
    Angermueller, Christof
    Lee, Heather J.
    Reik, Wolf
    Stegle, Oliver
    [J]. GENOME BIOLOGY, 2017, 18
  • [5] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
  • [6] Bajic B., 2012, BIOINFORMATICS, V28, P127
  • [7] Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements
    Barrett, Lucy W.
    Fletcher, Sue
    Wilton, Steve D.
    [J]. CELLULAR AND MOLECULAR LIFE SCIENCES, 2012, 69 (21) : 3613 - 3634
  • [8] AN RNA SECONDARY STRUCTURE JUXTAPOSES 2 REMOTE GENETIC SIGNALS FOR HUMAN T-CELL LEUKEMIA-VIRUS TYPE-I RNA 3'-END PROCESSING
    BARSHIRA, A
    PANET, A
    HONIGMAN, A
    [J]. JOURNAL OF VIROLOGY, 1991, 65 (10) : 5165 - 5173
  • [9] Bishop C. M., 2006, PATTERN RECOGNITION, DOI DOI 10.1117/1.2819119
  • [10] EFFECT OF RNA SECONDARY STRUCTURE ON POLYADENYLATION SITE SELECTION
    BROWN, PH
    TILEY, LS
    CULLEN, BR
    [J]. GENES & DEVELOPMENT, 1991, 5 (07) : 1277 - 1284