A classification-based prediction model of messenger RNA polyadenylation sites

被引：30

作者：

Ji, Guoli ^{[1
]}

Wu, Xiaohui ^{[1
]}

Shen, Yingjia ^{[2
]}

Huang, Jiangyin ^{[1
]}

Li, Qingshun Quinn ^{[2
]}

机构：

[1] Xiamen Univ, Dept Automat, Xiamen 361000, Peoples R China

[2] Miami Univ, Dept Bot, Oxford, OH 45056 USA

来源：

JOURNAL OF THEORETICAL BIOLOGY | 2010年 / 265卷 / 03期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Arabidopsis; Classification-based modeling; Genome annotation; Polyadenylation; Predictive modeling; AMINO-ACID-COMPOSITION; PROTEIN STRUCTURAL CLASSES; SUBCELLULAR LOCATION PREDICTION; SUPPORT VECTOR MACHINE; ALTERNATIVE POLYADENYLATION; CHLAMYDOMONAS-REINHARDTII; WEB-SERVER; RECOGNITION; SIGNALS; GENOME;

D O I：

10.1016/j.jtbi.2010.05.015

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Messenger RNA polyadenylation is one of the essential processing steps during eukaryotic gene expression. The site of polyadenylation [(poly(A) site] marks the end of a transcript, which is also the end of a gene. A computation program that is able to recognize poly(A) sites would not only prove useful for genome annotation in finding genes ends, but also for predicting alternative poly(A) sites. Features that define the poly(A) sites can now be extracted from the poly(A) site datasets to build such predictive models. Using methods, including K-gram pattern, Z-curve, position-specific scoring matrix and first-order inhomogeneous Markov sub-model, numerous features were generated and placed in an original feature space. To select the most useful features, attribute selection algorithms, such as information gain and entropy, were employed. A training model was then built based on the Bayesian network to determine a subset of the optimal features. Test models corresponding to the training models were built to predict poly(A) sites in Arabidopsis and rice. Thus, a prediction model, termed Poly(A) site classifier, or PAC, was constructed. The uniqueness of the model lies in its structure in that each sub-model can be replaced or expanded, while feature generation, selection and classification are all independent processes. Its modular design makes it easily adaptable to different species or datasets. The algorithm's high specificity and sensitivity were demonstrated by testing several datasets and, at the best combinations, they both reached 95%. The software package may be used for genome annotation and optimizing transgene structure. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：287 / 296

页数：10

共 49 条

[1] [Anonymous], NAT SCI
[2] Steady progress and recent breakthroughs in the accuracy of automated genome annotation
Brent, Michael R.
[J]. NATURE REVIEWS GENETICS, 2008, 9 (01) : 62 - 73
[3] Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine
Chen, Chao
Chen, Lixuan
Zou, Xiaoyong
Cai, Peixiang
[J]. PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01) : 27 - 31
[4] Prediction of mRNA polyadenylation sites by support vector machine
Cheng, Yiming
Miura, Robert M.
Tian, Bin
[J]. BIOINFORMATICS, 2006, 22 (19) : 2320 - 2325
[5] Protein subcellular location prediction
Chou, KC
Elrod, DW
[J]. PROTEIN ENGINEERING, 1999, 12 (02): : 107 - 118
[6] PREDICTION OF PROTEIN STRUCTURAL CLASSES
CHOU, KC
ZHANG, CT
[J]. CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) : 275 - 349
[7] Prediction of protein cellular attributes using pseudo-amino acid composition
Chou, KC
[J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03): : 246 - 255
[8] ProtIdent: A web server for identifying proteases and their types by fusing functional domain and sequential evolution information
Chou, Kuo-Chen
Shen, Hong-Bin
[J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2008, 376 (02) : 321 - 325
[9] Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms
Chou, Kuo-Chen
Shen, Hong-Bin
[J]. NATURE PROTOCOLS, 2008, 3 (02) : 153 - 162
[10] Recent progress in protein subcellular location prediction
Chou, Kuo-Chen
Shen, Hong-Bin
[J]. ANALYTICAL BIOCHEMISTRY, 2007, 370 (01) : 1 - 16

← 1 2 3 4 5 →