Di-codon Usage for Gene Classification

被引:0
作者
Nguyen, Minh N. [1 ]
Ma, Jianmin [1 ]
Fogel, Gary B. [2 ]
Rajapakse, Jagath C. [3 ,4 ,5 ]
机构
[1] BioInfomat Inst, Singapore, Singapore
[2] Nat Select Inc, San Diego, CA USA
[3] Nanyang Technol Univ, BioInformat Res Ctr, Singapore 639798, Singapore
[4] MIT Alliance, Singapore, Singapore
[5] MIT, Dept Biol Engn, Cambridge, MA 02139 USA
来源
PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS | 2009年 / 5780卷
关键词
ESCHERICHIA-COLI; SACCHAROMYCES-CEREVISIAE; CANCER CLASSIFICATION; BACILLUS-SUBTILIS; BINDING PEPTIDES; PREDICTION; SVM; LYMPHOCYTES; IMGT/HLA;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%; and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.
引用
收藏
页码:211 / +
页数:4
相关论文
共 36 条
  • [1] Prediction of CTL epitopes using QM, SVM and ANN techniques
    Bhasin, M
    Raghava, GPS
    [J]. VACCINE, 2004, 22 (23-24) : 3195 - 3204
  • [2] SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence
    Bhasin, M
    Raghava, GPS
    [J]. BIOINFORMATICS, 2004, 20 (03) : 421 - 423
  • [3] BODMER JG, 1995, TISSUE ANTIGENS, V46, P1
  • [4] Chang C.-C., LIBSVM: a Library for Support Vector Machines
  • [5] On the learnability and design of output codes for multiclass problems
    Crammer, K
    Singer, Y
    [J]. MACHINE LEARNING, 2002, 47 (2-3) : 201 - 233
  • [6] Cristianini N., 2000, INTRO SUPPORT VECTOR
  • [7] Prediction of MHC class I binding peptides, using SVMHC -: art. no. 25
    Dönnes, P
    Elofsson, A
    [J]. BMC BIOINFORMATICS, 2002, 3 (1)
  • [8] Multiple SVM-RFE for gene selection in cancer classification with expression data
    Duan, KB
    Rajapakse, JC
    Wang, HY
    Azuaje, F
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) : 228 - 234
  • [9] The molecular biology database collection: 2004 update
    Galperin, MY
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D3 - D22
  • [10] Euclidian space and grouping of biological objects
    Grishin, VN
    Grishin, NV
    [J]. BIOINFORMATICS, 2002, 18 (11) : 1523 - 1533