共 8 条
Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
被引:12
|作者:
Lin, Dan
[1
]
Yin, Xin
[1
]
Wang, Xianlong
[1
]
Zhou, Peng
[1
]
Guo, Feng-Biao
[1
]
机构:
[1] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Ctr Bioinformat, Chengdu 610054, Peoples R China
来源:
PLOS ONE
|
2013年
/
8卷
/
07期
基金:
中国博士后科学基金;
中国国家自然科学基金;
关键词:
YEAST GENOME;
CODON USAGE;
RECOGNITION;
SGD;
D O I:
10.1371/journal.pone.0064477
中图分类号:
O [数理科学和化学];
P [天文学、地球科学];
Q [生物科学];
N [自然科学总论];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy.
引用
收藏
页数:6
相关论文