Prediction of Human Disease-specific Phosphorylation Sites with Combined Feature Selection Approach and Support Vector Machine

被引:0
作者
Xu, Xiaoyi [1 ]
Li, Ao [1 ]
Wang, Minghui [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China
来源
2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2014年
关键词
phosphorylation; disease-specific; feature selection; PROTEIN-PHOSPHORYLATION; IDENTIFICATION; SEQUENCE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Phosphorylation is a crucial post translational modification, which regulates almost all cellular process in life. It has long been recognized that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time we propose a novel approach that is specially designed to identify disease-specific phosphorylation sites based on SVM. Human disease-associated phosphorylation data is extracted from PhosphoSitePlus database and local sequences are derived for training. To take full advantage of sequence information, a combined feature selection method-based SVM (CFS-SVM) that incorporates mRMR filtering process and forward feature selection process is developed. With CFS-SVM, we successfully predict disease-specific phosphorylation sites. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers, including Bayesian decision theory and k nearest neighbour. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, the analysis of corresponding kinases and selected features also shed light on understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
引用
收藏
页数:8
相关论文
共 33 条
[1]  
[Anonymous], 2014, Discovering Knowledge in Data, DOI [10.1002/9781118874059.CH7, DOI 10.1002/9781118874059.CH7]
[2]   32P LABELING OF PROTEIN PHOSPHORYLATION AND METABOLITE ASSOCIATION IN THE MITOCHONDRIA MATRIX [J].
Aponte, Angel M. ;
Phillips, Darci ;
Harris, Robert A. ;
Blinova, Ksenia ;
French, Stephanie ;
Johnson, D. Thor ;
Balaban, Robert S. .
METHODS IN ENZYMOLOGY, VOL 457: MITOCHONDRIAL FUNCTION, PARTB MITOCHONDRIAL PROTEIN KINASES, PROTEIN PHOSPHATASES AND MITOCHONDRIAL DISEASES, 2009, 457 :63-80
[3]   A probability-based approach for high-throughput protein phosphorylation analysis and site localization [J].
Beausoleil, Sean A. ;
Villen, Judit ;
Gerber, Scott A. ;
Rush, John ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2006, 24 (10) :1285-1292
[4]  
Berger J.O., 1985, Statistical decision theory and Bayesian analysis, V2nd
[5]   Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms [J].
Berry, EA ;
Dalby, AR ;
Yang, ZR .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (01) :75-85
[6]   Sequence and structure-based prediction of eukaryotic protein phosphorylation sites [J].
Blom, N ;
Gammeltoft, S ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) :1351-1362
[7]   Structural basis and prediction of substrate specificity in protein serine/threonine kinases [J].
Brinkworth, RI ;
Breinl, RA ;
Kobe, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (01) :74-79
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   The origins of protein phosphorylation [J].
Cohen, P .
NATURE CELL BIOLOGY, 2002, 4 (05) :E127-E130
[10]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190