Semi-Supervised Multi-View Learning for Gene Network Reconstruction

被引:33
作者
Ceci, Michelangelo [1 ]
Pio, Gianvito [1 ]
Kuzmanovski, Vladimir [2 ,3 ]
Dzeroski, Saso [2 ,3 ]
机构
[1] Univ Bari Aldo Moro, Dept Comp Sci, I-70125 Bari, Italy
[2] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia
[3] Jozef Stefan Int Postgrad Sch, Ljubljana 1000, Slovenia
关键词
LARGE-SCALE ORGANIZATION; REGULATORY NETWORKS; EXPRESSION DATA; CLASSIFICATION; DISCOVERY; INFERENCE; VIEW;
D O I
10.1371/journal.pone.0144031
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/ Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827.
引用
收藏
页数:27
相关论文
共 39 条
[1]   Scale-free networks in cell biology [J].
Albert, R .
JOURNAL OF CELL SCIENCE, 2005, 118 (21) :4947-4957
[2]  
[Anonymous], 2002, Bioinformatics, DOI DOI 10.1093/BIOINFORMATICS/18.SUPPL_1.S216
[3]   Comparative Analysis of Protein Networks: Hard Problems, Practical Solutions [J].
Atias, Nir ;
Sharan, Roded .
COMMUNICATIONS OF THE ACM, 2012, 55 (05) :88-97
[4]   How to infer gene networks from expression profiles [J].
Bansal, Mukesh ;
Belcastro, Vincenzo ;
Ambesi-Impiombato, Alberto ;
di Bernardo, Diego .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
[5]   NCBI GEO: archive for functional genomics data sets-10 years on [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Holko, Michelle ;
Ayanbule, Oluwabukunmi ;
Yefanov, Andrey ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1005-D1010
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors [J].
Berger, Michael F. ;
Bulyk, Martha L. .
NATURE PROTOCOLS, 2009, 4 (03) :393-411
[8]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[9]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[10]  
Boyd Kendrick, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P451, DOI 10.1007/978-3-642-40994-3_29