On using physico-chemical properties of amino acids in string kernels for protein classification via support vector machines

被引:0
作者
Li Limin [1 ]
Aoki-Kinoshita, Kiyoko F. [2 ]
Ching Wai-Ki [3 ]
Jiang Hao [4 ]
机构
[1] Xi An Jiao Tong Univ, Inst Informat & Syst Sci, Xian 710049, Peoples R China
[2] Soka Univ, Dept Bioinformat, Fac Engn, Tokyo, Japan
[3] Univ Hong Kong, Dept Math, Adv Modeling & Appl Comp Lab, Hong Kong, Hong Kong, Peoples R China
[4] Renmin Univ China, Sch Informat, Dept Math, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
AAindex; AA spectrum kernel; correlation spectrum kernel; physico-chemical properties; string kernel; weighted spectrum kernel; LECTIN; PREDICTION; GLYCOMICS;
D O I
10.1007/s11424-015-2156-y
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
String kernels are popular tools for analyzing protein sequence data and they have been successfully applied to many computational biology problems. The traditional string kernels assume that different substrings are independent. However, substrings can be highly correlated due to their substructure relationship or common physico-chemical properties. This paper proposes two kinds of weighted spectrum kernels: The correlation spectrum kernel and the AA spectrum kernel. We evaluate their performances by predicting glycan-binding proteins of 12 glycans. The results show that the correlation spectrum kernel and the AA spectrum kernel perform significantly better than the spectrum kernel for nearly all the 12 glycans. By comparing the predictive power of AA spectrum kernels constructed by different physico-chemical properties, the authors can also identify the physicochemical properties which contributes the most to the glycan-protein binding. The results indicate that physico-chemical properties of amino acids in proteins play an important role in the mechanism of glycan-protein binding.
引用
收藏
页码:504 / 516
页数:13
相关论文
共 33 条
  • [1] [Anonymous], 1979, Theorie der Zeichenerkennung
  • [2] [Anonymous], INT J INFORM TECHNOL
  • [3] [Anonymous], 2004, Kernel methods in computational biology
  • [4] ARGOS P, 1982, EUR J BIOCHEM, V128, P565
  • [5] Structural similarity and functional diversity in proteins containing the legume lectin fold
    Chandra, NR
    Prabu, MM
    Suguna, K
    Vijayan, M
    [J]. PROTEIN ENGINEERING, 2001, 14 (11): : 857 - 866
  • [6] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [7] Carbohydrate microarrays - a new set of technologies at the frontiers of glycomics
    Feizi, T
    Fazio, F
    Chai, WC
    Wong, CH
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2003, 13 (05) : 637 - 645
  • [8] Structural features of the legume lectins
    Hamelryck, TW
    Loris, R
    Bouckaert, J
    Wyns, L
    [J]. TRENDS IN GLYCOSCIENCE AND GLYCOTECHNOLOGY, 1998, 10 (55) : 349 - 360
  • [9] STRUCTURE OF MANNOSE-SPECIFIC SNOWDROP (GALANTHUS-NIVALIS) LECTIN IS REPRESENTATIVE OF A NEW PLANT LECTIN FAMILY
    HESTER, G
    KAKU, H
    GOLDSTEIN, IJ
    WRIGHT, CS
    [J]. NATURE STRUCTURAL BIOLOGY, 1995, 2 (06): : 472 - 479
  • [10] Characterization of the α-helix region in domain 3 of the haemolytic lectin CEL-III:: Implications for self-oligomerization and haemolytic processes
    Hisamatsu, Keigo
    Tsuda, Nobuaki
    Goda, Shuichiro
    Hatakeyama, Tomomitsu
    [J]. JOURNAL OF BIOCHEMISTRY, 2008, 143 (01) : 79 - 86