Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines

被引:12
|
作者
Lin, Dan [1 ]
Yin, Xin [1 ]
Wang, Xianlong [1 ]
Zhou, Peng [1 ]
Guo, Feng-Biao [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Ctr Bioinformat, Chengdu 610054, Peoples R China
来源
PLOS ONE | 2013年 / 8卷 / 07期
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
YEAST GENOME; CODON USAGE; RECOGNITION; SGD;
D O I
10.1371/journal.pone.0064477
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy.
引用
收藏
页数:6
相关论文
共 8 条
  • [1] Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
    Guo, Feng-Biao
    Xiong, Lifeng
    Teng, Jade L. L.
    Yuen, Kwok-Yung
    Lau, Susanna K. P.
    Woo, Patrick C. Y.
    DNA RESEARCH, 2013, 20 (03) : 273 - 286
  • [2] Discriminate the Falsely Predicted Protein-Coding Genes in Aeropyrum Pernix K1 Genome Based on Graphical Representation
    Yu, Jia-Feng
    Jiang, Dong-Ke
    Xiao, Ke
    Jin, Yun
    Wang, Ji-Hua
    Sun, Xiao
    MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2012, 67 (03) : 845 - 866
  • [3] Theoretical Prediction and Experimental Verification of Protein-Coding Genes in Plant Pathogen Genome Agrobacterium tumefaciens Strain C58
    Wang, Qian
    Lei, Yang
    Xu, Xiwen
    Wang, Gejiao
    Chen, Ling-Ling
    PLOS ONE, 2012, 7 (09):
  • [4] Recognition of protein coding genes in the yeast genome based on the relative-entropy of DNA
    Li, C
    Helal, N
    Wang, J
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2006, 9 (01) : 49 - 54
  • [5] Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition
    Tong Zhou
    Jianhong Weng
    Xiao Sun
    Zuhong Lu
    BMC Bioinformatics, 7
  • [6] Evaluating the annotation of protein-coding genes in bacterial genomes: Chloroflexus aurantiacus strain J-10-fl and Natrinema sp J7-2 as case studies
    Zhang, H. X.
    Li, S. J.
    Zhou, H. Q.
    GENETICS AND MOLECULAR RESEARCH, 2014, 13 (04) : 10891 - 10897
  • [7] Genomic analysis and phylogenetic characterization of Himalayan snow trout, Schizothorax esocinus based on mitochondrial protein-coding genes
    Akhter, G.
    Ahmed, I.
    Ahmad, S. M.
    MOLECULAR BIOLOGY REPORTS, 2024, 51 (01)
  • [8] Fine-grained protein fold assignment by support vector machines using generalized npeptide coding schemes and jury voting from multiple-parameter sets
    Yu, CS
    Wang, JY
    Yang, JM
    Lyu, PC
    Lin, CJ
    Hwang, JK
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 50 (04) : 531 - 536