High-dimensional structure learning of binary pairwise Markov networks: A comparative numerical study

被引:4
作者
Pensar, Johan [1 ]
Xu, Yingying [2 ,3 ]
Puranen, Santeri [2 ,3 ,5 ]
Pesonen, Maiju [2 ,3 ]
Kabashima, Yoshiyuki [4 ]
Corander, Jukka [1 ,5 ,6 ]
机构
[1] Univ Helsinki, Fac Sci, Dept Math & Stat, Helsinki, Finland
[2] Aalto Univ, Dept Comp Sci, Espoo, Finland
[3] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[4] Tokyo Inst Technol, Dept Math & Comp Sci, Tokyo, Japan
[5] Univ Oslo, Dept Biostat, Oslo, Norway
[6] Wellcome Sanger Inst, Parasites & Microbes, Cambridge, England
基金
芬兰科学院; 欧洲研究理事会;
关键词
Markov network; Ising model; Structure learning; Mutual information; Pseudo-likelihood; Gibbs sampler; ISING-MODEL SELECTION; PROTEIN-STRUCTURE;
D O I
10.1016/j.csda.2019.06.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning the undirected graph structure of a Markov network from data is a problem that has received a lot of attention during the last few decades. As a result of the general applicability of the model class, a myriad of methods have been developed in parallel in several research fields. Recently, as the size of the considered systems has increased, the focus of new methods has been shifted towards the high-dimensional domain. In particular, introduction of the pseudo-likelihood function has pushed the limits of score-based methods which were originally based on the likelihood function. At the same time, methods based on simple pairwise tests have been developed to meet the challenges arising from increasingly large data sets in computational biology. Apart from being applicable to high-dimensional problems, methods based on the pseudo-likelihood and pairwise tests are fundamentally very different. To compare the accuracy of the different types of methods, an extensive numerical study is performed on data generated by binary pairwise Markov networks. A parallelizable Gibbs sampler, based on restricted Boltzmann machines, is proposed as a tool to efficiently sample from sparse high-dimensional networks. The results of the study show that pairwise methods can be more accurate than pseudo-likelihood methods in settings often encountered in high-dimensional structure learning applications. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:62 / 76
页数:15
相关论文
共 28 条
[1]  
Alanis-Lobato G., 2014, CNM MATLAB TOOLBOX C
[2]  
[Anonymous], INT C ART INT STAT
[3]  
[Anonymous], 2006, Proc. Adv. Neural Inf. Process. Syst
[4]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[5]   High-dimensional Ising model selection with Bayesian information criteria [J].
Barber, Rina Foygel ;
Drton, Mathias .
ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01) :567-607
[6]   STATISTICAL-ANALYSIS OF NON-LATTICE DATA [J].
BESAG, J .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 1975, 24 (03) :179-195
[7]  
Butte A J, 2000, Pac Symp Biocomput, P418
[8]   Comparing co-evolution methods and their application to template-free protein structure prediction [J].
de Oliveira, Saulo Henrique Pires ;
Shi, Jiye ;
Deane, Charlotte M. .
BIOINFORMATICS, 2017, 33 (03) :373-381
[9]   Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences [J].
Ekeberg, Magnus ;
Hartonen, Tuomo ;
Aurell, Erik .
JOURNAL OF COMPUTATIONAL PHYSICS, 2014, 276 :341-356
[10]   Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models [J].
Ekeberg, Magnus ;
Lovkvist, Cecilia ;
Lan, Yueheng ;
Weigt, Martin ;
Aurell, Erik .
PHYSICAL REVIEW E, 2013, 87 (01)