Di-codon Usage for Gene Classification

被引:0
作者
Nguyen, Minh N. [1 ]
Ma, Jianmin [1 ]
Fogel, Gary B. [2 ]
Rajapakse, Jagath C. [3 ,4 ,5 ]
机构
[1] BioInfomat Inst, Singapore, Singapore
[2] Nat Select Inc, San Diego, CA USA
[3] Nanyang Technol Univ, BioInformat Res Ctr, Singapore 639798, Singapore
[4] MIT Alliance, Singapore, Singapore
[5] MIT, Dept Biol Engn, Cambridge, MA 02139 USA
来源
PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS | 2009年 / 5780卷
关键词
ESCHERICHIA-COLI; SACCHAROMYCES-CEREVISIAE; CANCER CLASSIFICATION; BACILLUS-SUBTILIS; BINDING PEPTIDES; PREDICTION; SVM; LYMPHOCYTES; IMGT/HLA;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%; and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.
引用
收藏
页码:211 / +
页数:4
相关论文
共 36 条
[1]   Prediction of CTL epitopes using QM, SVM and ANN techniques [J].
Bhasin, M ;
Raghava, GPS .
VACCINE, 2004, 22 (23-24) :3195-3204
[2]   SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence [J].
Bhasin, M ;
Raghava, GPS .
BIOINFORMATICS, 2004, 20 (03) :421-423
[3]  
BODMER JG, 1995, TISSUE ANTIGENS, V46, P1
[4]  
Chang C.-C., LIBSVM: a Library for Support Vector Machines
[5]   On the learnability and design of output codes for multiclass problems [J].
Crammer, K ;
Singer, Y .
MACHINE LEARNING, 2002, 47 (2-3) :201-233
[6]  
Cristianini N., 2000, INTRO SUPPORT VECTOR
[7]   Prediction of MHC class I binding peptides, using SVMHC -: art. no. 25 [J].
Dönnes, P ;
Elofsson, A .
BMC BIOINFORMATICS, 2002, 3 (1)
[8]   Multiple SVM-RFE for gene selection in cancer classification with expression data [J].
Duan, KB ;
Rajapakse, JC ;
Wang, HY ;
Azuaje, F .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) :228-234
[9]   The molecular biology database collection: 2004 update [J].
Galperin, MY .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D3-D22
[10]   Euclidian space and grouping of biological objects [J].
Grishin, VN ;
Grishin, NV .
BIOINFORMATICS, 2002, 18 (11) :1523-1533