Active learning for protein function prediction in protein-protein interaction networks

被引:18
作者
Xiong, Wei [1 ,2 ]
Xie, Luyu [1 ,2 ]
Zhou, Shuigeng [1 ,2 ]
Guan, Jihong [3 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[3] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
关键词
Protein function prediction; Active learning; Collective classification; Protein-protein interaction network; Centrality; GENE ONTOLOGY; BIOLOGICAL FUNCTION; ANNOTATION; CLASSIFICATION; ALIGNMENT; DATABASE;
D O I
10.1016/j.neucom.2014.05.075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The high-throughput technologies have led to vast amounts of protein-protein interaction (PPI) data, and a number of approaches based on PPI networks have been proposed for protein function prediction. However, these approaches do not work well if annotated or labeled proteins are scarce in the networks. To address this issue, we propose an active learning based approach that uses graph-based centrality metrics to select proper candidates for labeling. We first cluster a PPI network by using the spectral clustering algorithm and select some informative candidates for labeling within each cluster according to a certain centrality metric, and then apply a collective classification algorithm to predict protein function based on these labeled proteins. Experiments over two real datasets demonstrate that the active learning based approach achieves a better prediction performance by choosing more informative proteins for labeling. Experimental results also validate that betweenness centrality is more effective than degree centrality and closeness centrality in most cases. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:44 / 52
页数:9
相关论文
共 37 条
[1]   CFinder:: locating cliques and overlapping modules in biological networks [J].
Adamcsek, B ;
Palla, G ;
Farkas, IJ ;
Derényi, I ;
Vicsek, T .
BIOINFORMATICS, 2006, 22 (08) :1021-1023
[2]  
[Anonymous], 2010, MACH LEARN
[3]   Iterative cluster analysis of protein interaction data [J].
Arnau, V ;
Mars, S ;
Marín, I .
BIOINFORMATICS, 2005, 21 (03) :364-378
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The GOA database in 2009-an integrated Gene Ontology Annotation resource [J].
Barrell, Daniel ;
Dimmer, Emily ;
Huntley, Rachael P. ;
Binns, David ;
O'Donovan, Claire ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D396-D403
[6]   Multifunctional proteins revealed by overlapping clustering in protein interaction network [J].
Becker, Emmanuelle ;
Robisson, Benoit ;
Chapple, Charles E. ;
Guenoche, Alain ;
Brun, Christine .
BIOINFORMATICS, 2012, 28 (01) :84-90
[7]   Molecular Function Prediction Using Neighborhood Features [J].
Bogdanov, Petko ;
Singh, Ambuj K. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (02) :208-217
[8]   An efficient strategy for extensive integration of diverse biological data for protein function prediction [J].
Chua, Hon Nian ;
Sung, Wing-Kin ;
Wong, Limsoon .
BIOINFORMATICS, 2007, 23 (24) :3364-3373
[9]   Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions [J].
Chua, Hon Nian ;
Sung, Wing-Kin ;
Wong, Limsoon .
BIOINFORMATICS, 2006, 22 (13) :1623-1630
[10]   The use of edge-betweenness clustering to investigate biological function in protein interaction networks [J].
Dunn, R ;
Dudbridge, F ;
Sanderson, CM .
BMC BIOINFORMATICS, 2005, 6 (1)