Adaptive compressive learning for prediction of protein-protein interactions from primary sequence

被引:42
作者
Zhang, Ya-Nan [1 ,2 ]
Pan, Xiao-Yong [1 ,2 ]
Huang, Yan [3 ]
Shen, Hong-Bin [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
[2] Minist Educ China, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] Chinese Acad Sci, Shanghai Inst Tech Phys, Natl Lab Infrared Phys, Shanghai 200083, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interactions prediction; Sequential discrete representation; Compressed sensing; Nyquist sampling; SUPPORT VECTOR MACHINE; SUBCELLULAR LOCATION PREDICTION; RESTRICTED ISOMETRY PROPERTY; SIGNAL RECOVERY; SACCHAROMYCES-CEREVISIAE; ENSEMBLE CLASSIFIER; NETWORKS; SET; IDENTIFICATION; HYPERPLANES;
D O I
10.1016/j.jtbi.2011.05.023
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:44 / 52
页数:9
相关论文
共 73 条
[1]   Stochastic proximity embedding [J].
Agrafiotis, DK .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2003, 24 (10) :1215-1221
[2]   InterPreTS: protein Interaction Prediction through Tertiary Structure [J].
Aloy, P ;
Russell, RB .
BIOINFORMATICS, 2003, 19 (01) :161-162
[3]   Interrogating protein interaction networks through structural biology [J].
Aloy, P ;
Russell, RB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (09) :5896-5901
[4]   A Simple Proof of the Restricted Isometry Property for Random Matrices [J].
Baraniuk, Richard ;
Davenport, Mark ;
DeVore, Ronald ;
Wakin, Michael .
CONSTRUCTIVE APPROXIMATION, 2008, 28 (03) :253-263
[5]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[6]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[7]  
Brand Matthew, 2003, Advances in Neural Information Processing Systems, P985
[8]  
Calderbank R., 2009, preprint
[9]   The restricted isometry property and its implications for compressed sensing [J].
Candes, Emmanuel J. .
COMPTES RENDUS MATHEMATIQUE, 2008, 346 (9-10) :589-592
[10]   Near-optimal signal recovery from random projections: Universal encoding strategies? [J].
Candes, Emmanuel J. ;
Tao, Terence .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (12) :5406-5425