Identification of functionally diverse lipocalin proteins from sequence information using support vector machine

被引:0
作者
Ganesan Pugalenthi
Krishna Kumar Kandaswamy
P. N. Suganthan
G. Archunan
R. Sowdhamini
机构
[1] Nanyang Technological University,School of Electrical and Electronic Engineering
[2] University of Lübeck,Institute for Neuro
[3] University of Lübeck, and Bioinformatics
[4] Bharathidasan University,Graduate School for Computing in Medicine and Life Sciences
[5] National Centre for Biological Sciences,Center for Pheromone Technology, Department of Animal Science
来源
Amino Acids | 2010年 / 39卷
关键词
Lipocalin; Diverse function; Odorant binding; Support vector machine; Ligand binding; Allergenic proteins; Salivary proteins;
D O I
暂无
中图分类号
学科分类号
摘要
Lipocalins are functionally diverse proteins that are composed of 120–180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew’s correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm.
引用
收藏
页码:777 / 783
页数:6
相关论文
共 182 条
[1]  
Adam B(2008)Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling BMC Struct Biol 8 1-8
[2]  
Charloteaux B(2000)Lipocalins: unity in diversity Biochim Biophys Acta 1482 1-3402
[3]  
Beaufays J(1997)Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25 3389-48
[4]  
Vanhamme L(2000)The SWISS-PROT protein sequence database, its supplement TrEMBL in 2000 Nucleic Acids Res 28 45-83
[5]  
Godfroid E(2000)The bacterial lipocalins Biochim Biophys Acta 1482 73-167
[6]  
Brasseur R(1998)A tutorial on support vector machines for pattern recognition Data Min Knowl Disc 2 121-296
[7]  
Lins L(2002)Prediction of protein structural classes by support vector machines Comput Chem 26 293-3697
[8]  
Akerstrom B(2003)SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence Nucleic Acids Res 31 3692-255
[9]  
Flower DR(2001)Prediction of protein cellular attributes using pseudo amino acid composition Proteins Struct Funct Genet 43 246-19
[10]  
Salier JP(2005)Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Bioinformatics 21 10-413