DeepPASTA: deep neural network based polyadenylation site analysis

被引:33
作者
Arefeen, Ashraful [1 ]
Xiao, Xinshu [2 ]
Jiang, Tao [1 ,3 ,4 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA 90095 USA
[3] Univ Calif Riverside, Inst Integrat Genome Biol, Riverside, CA 92521 USA
[4] Tsinghua Univ, Dept Comp Sci & Technol, Bioinformat Div, BNRIST, Beijing 100084, Peoples R China
关键词
MESSENGER-RNA POLYADENYLATION; SECONDARY STRUCTURE; DNA; PREDICTION; SEQUENCE; IDENTIFICATION; MECHANISMS; CLEAVAGE; SIGNALS; GENOME;
D O I
10.1093/bioinformatics/btz283
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. Results: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction.
引用
收藏
页码:4577 / 4585
页数:9
相关论文
共 54 条
  • [21] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [22] Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
    Hafez, Dina
    Ni, Ting
    Mukherjee, Sayan
    Zhu, Jun
    Ohler, Uwe
    [J]. BIOINFORMATICS, 2013, 29 (13) : 108 - 116
  • [23] Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation
    Hu, J
    Lutz, CS
    Wilusz, J
    Tian, B
    [J]. RNA, 2005, 11 (10) : 1485 - 1493
  • [24] Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
    Kelley, David R.
    Snoek, Jasper
    Rinn, John L.
    [J]. GENOME RESEARCH, 2016, 26 (07) : 990 - 999
  • [25] Global or local? Predicting secondary structure and accessibility in mRNAs
    Lange, Sita J.
    Maticzka, Daniel
    Moehl, Mathias
    Gagnon, Joshua N.
    Brown, Chris M.
    Backofen, Rolf
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (12) : 5215 - 5226
  • [26] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [27] Inference of the human polyadenylation code
    Leung, Michael K. K.
    Delong, Andrew
    Frey, Brendan J.
    [J]. BIOINFORMATICS, 2018, 34 (17) : 2889 - 2898
  • [28] An in-depth map of polyadenylation sites in cancer
    Lin, Yuefeng
    Li, Zhihua
    Ozsolak, Fatih
    Kim, Sang Woo
    Arango-Argoty, Gustavo
    Liu, Teresa T.
    Tenenbaum, Scott A.
    Bailey, Timothy
    Monaghan, A. Paula
    Milos, Patrice M.
    John, Bino
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (17) : 8460 - 8471
  • [29] DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences
    Liu, HQ
    Han, H
    Li, JY
    Wong, LS
    [J]. BIOINFORMATICS, 2005, 21 (05) : 671 - 673
  • [30] Liu Huiqing, 2003, Genome Inform, V14, P84