Fitting a geometric graph to a protein-protein interaction network

被引:85
作者
Higham, Desmond J. [1 ]
Rasajski, Marija [2 ,3 ]
Przulji, Natasa [2 ]
机构
[1] Univ Strathclyde, Dept Math, Glasgow G1 1XH, Lanark, Scotland
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[3] Univ Belgrade, Fac Elect Engn, Belgrade, Serbia
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btn079
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Finding a good network null model for proteinprotein interaction (PPI) networks is a fundamental issue. Such a model would provide insights into the interplay between network structure and biological function as well as into evolution. Also, network (graph) models are used to guide biological experiments and discover new biological features. It has been proposed that geometric random graphs are a good model for PPI networks. In a geometric random graph, nodes correspond to uniformly randomly distributed points in a metric space and edges (links) exist between pairs of nodes for which the corresponding points in the metric space are close enough according to some distance norm. Computational experiments have revealed close matches between key topological properties of PPI networks and geometric random graph models. In this work, we push the comparison further by exploiting the fact that the geometric property can be tested for directly. To this end, we develop an algorithm that takes PPI interaction data and embeds proteins into a low-dimensional Euclidean space, under the premise that connectivity information corresponds to Euclidean proximity, as in geometric-random graphs. We judge the sensitivity and specificity of the fit by computing the area under the Receiver Operator Characteristic (ROC) curve. The network embedding algorithm is based on multi-dimensional scaling, with the square root of the path length in a network playing the role of the Euclidean distance in the Euclidean space. The algorithm exploits sparsity for computational efficiency, and requires only a few sparse matrix multiplications, giving a complexity of O(N-2) where N is the number of proteins. Results: The algorithm has been verified in the sense that it successfully rediscovers the geometric structure in artificially constructed geometric networks, even when noise is added by re-wiring some links. Applying the algorithm to 19 publicly available PPI networks of various organisms indicated that: (a) geometric effects are present and (b) two-dimensional Euclidean space is generally as effective as higher dimensional Euclidean space for explaining the connectivity. Testing on a high-confidence yeast data set produced a very strong indication of geometric structure (area under the ROC curve of 0.89), with this network being essentially indistinguishable from a noisy geometric network. Overall, the results add support to the hypothesis that PPI networks have a geometric structure.
引用
收藏
页码:1093 / 1099
页数:7
相关论文
共 45 条
[1]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[2]   ASYMPTOTIC NUMBER OF LABELED GRAPHS WITH GIVEN DEGREE SEQUENCES [J].
BENDER, EA ;
CANFIELD, ER .
JOURNAL OF COMBINATORIAL THEORY SERIES A, 1978, 24 (03) :296-307
[3]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[4]  
Cox T. F., 1994, MULTIDIMENSIONAL SCA
[5]  
ERDOS P, 1960, B INT STATIST INST, V38, P343
[6]   Functional organization of the yeast proteome by systematic analysis of protein complexes [J].
Gavin, AC ;
Bösche, M ;
Krause, R ;
Grandi, P ;
Marzioch, M ;
Bauer, A ;
Schultz, J ;
Rick, JM ;
Michon, AM ;
Cruciat, CM ;
Remor, M ;
Höfert, C ;
Schelder, M ;
Brajenovic, M ;
Ruffner, H ;
Merino, A ;
Klein, K ;
Hudak, M ;
Dickson, D ;
Rudi, T ;
Gnau, V ;
Bauch, A ;
Bastuck, S ;
Huhse, B ;
Leutwein, C ;
Heurtier, MA ;
Copley, RR ;
Edelmann, A ;
Querfurth, E ;
Rybin, V ;
Drewes, G ;
Raida, M ;
Bouwmeester, T ;
Bork, P ;
Seraphin, B ;
Kuster, B ;
Neubauer, G ;
Superti-Furga, G .
NATURE, 2002, 415 (6868) :141-147
[7]   A protein interaction map of Drosophila melanogaster [J].
Giot, L ;
Bader, JS ;
Brouwer, C ;
Chaudhuri, A ;
Kuang, B ;
Li, Y ;
Hao, YL ;
Ooi, CE ;
Godwin, B ;
Vitols, E ;
Vijayadamodar, G ;
Pochart, P ;
Machineni, H ;
Welsh, M ;
Kong, Y ;
Zerhusen, B ;
Malcolm, R ;
Varrone, Z ;
Collis, A ;
Minto, M ;
Burgess, S ;
McDaniel, L ;
Stimpson, E ;
Spriggs, F ;
Williams, J ;
Neurath, K ;
Ioime, N ;
Agee, M ;
Voss, E ;
Furtak, K ;
Renzulli, R ;
Aanensen, N ;
Carrolla, S ;
Bickelhaupt, E ;
Lazovatsky, Y ;
DaSilva, A ;
Zhong, J ;
Stanyon, CA ;
Finley, RL ;
White, KP ;
Braverman, M ;
Jarvie, T ;
Gold, S ;
Leach, M ;
Knight, J ;
Shimkets, RA ;
McKenna, MP ;
Chant, J ;
Rothberg, JM .
SCIENCE, 2003, 302 (5651) :1727-1736
[8]  
Golub G. H., 2012, Matrix Computations, V4th
[9]   Review of uses of network and graph theory concepts within proteomics [J].
Grindrod, P ;
Kibble, M .
EXPERT REVIEW OF PROTEOMICS, 2004, 1 (02) :229-238
[10]   Range-dependent random graphs and their application to modeling large small-world Proteome datasets [J].
Grindrod, P .
PHYSICAL REVIEW E, 2002, 66 (06) :7