DeepPASTA: deep neural network based polyadenylation site analysis

被引：33

作者：

Arefeen, Ashraful ^{[1
]}

Xiao, Xinshu ^{[2
]}

Jiang, Tao ^{[1
,3
,4
]}

机构：

[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA

[2] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA 90095 USA

[3] Univ Calif Riverside, Inst Integrat Genome Biol, Riverside, CA 92521 USA

[4] Tsinghua Univ, Dept Comp Sci & Technol, Bioinformat Div, BNRIST, Beijing 100084, Peoples R China

来源：

BIOINFORMATICS | 2019年 / 35卷 / 22期

关键词：

MESSENGER-RNA POLYADENYLATION; SECONDARY STRUCTURE; DNA; PREDICTION; SEQUENCE; IDENTIFICATION; MECHANISMS; CLEAVAGE; SIGNALS; GENOME;

D O I：

10.1093/bioinformatics/btz283

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. Results: In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction.

引用

页码：4577 / 4585

页数：9

共 54 条

[21] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[22] Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
Hafez, Dina
Ni, Ting
Mukherjee, Sayan
Zhu, Jun
Ohler, Uwe
[J]. BIOINFORMATICS, 2013, 29 (13) : 108 - 116
[23] Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation
Hu, J
Lutz, CS
Wilusz, J
Tian, B
[J]. RNA, 2005, 11 (10) : 1485 - 1493
[24] Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
Kelley, David R.
Snoek, Jasper
Rinn, John L.
[J]. GENOME RESEARCH, 2016, 26 (07) : 990 - 999
[25] Global or local? Predicting secondary structure and accessibility in mRNAs
Lange, Sita J.
Maticzka, Daniel
Moehl, Mathias
Gagnon, Joshua N.
Brown, Chris M.
Backofen, Rolf
[J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (12) : 5215 - 5226
[26] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[27] Inference of the human polyadenylation code
Leung, Michael K. K.
Delong, Andrew
Frey, Brendan J.
[J]. BIOINFORMATICS, 2018, 34 (17) : 2889 - 2898
[28] An in-depth map of polyadenylation sites in cancer
Lin, Yuefeng
Li, Zhihua
Ozsolak, Fatih
Kim, Sang Woo
Arango-Argoty, Gustavo
Liu, Teresa T.
Tenenbaum, Scott A.
Bailey, Timothy
Monaghan, A. Paula
Milos, Patrice M.
John, Bino
[J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (17) : 8460 - 8471
[29] DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences
Liu, HQ
Han, H
Li, JY
Wong, LS
[J]. BIOINFORMATICS, 2005, 21 (05) : 671 - 673
[30] Liu Huiqing, 2003, Genome Inform, V14, P84

← 1 2 3 4 5 6 →