Ensemble non-negative matrix factorization methods for clustering proteinprotein interactions

被引:65
作者
Greene, Derek [1 ]
Cagney, Gerard [2 ,3 ]
Krogan, Nevan [3 ]
Cunningham, Padraig [1 ]
机构
[1] Univ Coll Dublin, Sch Informat & Comp Sci, Dublin, Ireland
[2] Univ Coll Dublin, Conway Inst Biomol & Biomed Res, Dublin, Ireland
[3] Univ Calif San Francisco, Dept Cellular & Mol Pharmacol, San Francisco, CA 94143 USA
关键词
D O I
10.1093/bioinformatics/btn286
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: When working with large-scale protein interaction data, an important analysis task is the assignment of pairs of proteins to groups that correspond to higher order assemblies. Previously a common approach to this problem has been to apply standard hierarchical clustering methods to identify such a groups. Here we propose a new algorithm for aggregating a diverse collection of matrix factorizations to produce a more informative clustering, which takes the form of a soft hierarchy of clusters. Results: We apply the proposed Ensemble non-negative matrix factorization (NMF) algorithm to a high-quality assembly of binary protein interactions derived from two proteome-wide studies in yeast. Our experimental evaluation demonstrates that the algorithm lends itself to discovering small localized structures in this data, which correspond to known functional groupings of complexes. In addition, we show that the algorithm also supports the assignment of putative functions for previously uncharacterized proteins, for instance the protein YNR024W, which may be an uncharacterized component of the exosome.
引用
收藏
页码:1722 / 1728
页数:7
相关论文
共 26 条
[1]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[2]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[3]   SGD:: Saccharomyces Genome Database [J].
Cherry, JM ;
Adler, C ;
Ball, C ;
Chervitz, SA ;
Dwight, SS ;
Hester, ET ;
Jia, YK ;
Juvik, G ;
Roe, T ;
Schroeder, M ;
Weng, SA ;
Botstein, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :73-79
[4]  
COLLINS S, 2007, MOL CELL PROTEOMICS
[5]   Cluster merging and splitting in hierarchical clustering algorithms [J].
Ding, C ;
He, XF .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :139-146
[6]  
Ding C, 2005, SIAM PROC S, P606
[7]   Proteome survey reveals modularity of the yeast cell machinery [J].
Gavin, AC ;
Aloy, P ;
Grandi, P ;
Krause, R ;
Boesche, M ;
Marzioch, M ;
Rau, C ;
Jensen, LJ ;
Bastuck, S ;
Dümpelfeld, B ;
Edelmann, A ;
Heurtier, MA ;
Hoffman, V ;
Hoefert, C ;
Klein, K ;
Hudak, M ;
Michon, AM ;
Schelder, M ;
Schirle, M ;
Remor, M ;
Rudi, T ;
Hooper, S ;
Bauer, A ;
Bouwmeester, T ;
Casari, G ;
Drewes, G ;
Neubauer, G ;
Rick, JM ;
Kuster, B ;
Bork, P ;
Russell, RB ;
Superti-Furga, G .
NATURE, 2006, 440 (7084) :631-636
[8]   Cluster structure inference based on clustering stability with applications to microarray data analysis [J].
Giurcaneanu, CD ;
Tabus, I .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (01) :64-80
[9]   Networking proteins in yeast [J].
Hazbun, TR ;
Fields, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (08) :4277-4278
[10]   Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis [J].
Kim, Hyunsoo ;
Park, Haesun .
BIOINFORMATICS, 2007, 23 (12) :1495-1502