SuperDCA for genome-wide epistasis analysis

被引:18
作者
Puranen, Santeri [1 ,2 ]
Pesonen, Maiju [1 ,2 ]
Pensar, Johan [2 ]
Xu, Ying Ying [1 ,2 ]
Lees, John A. [3 ]
Bentley, Stephen D. [3 ]
Croucher, Nicholas J. [4 ]
Corander, Jukka [2 ,3 ,5 ]
机构
[1] Aalto Univ, Dept Comp Sci, FI-00076 Espoo, Finland
[2] Univ Helsinki, HIIT, Dept Math & Stat, FI-00014 Helsinki, Finland
[3] Wellcome Trust Sanger Inst, Pathogen Genom, Cambridge CB10 1SA, England
[4] Imperial Coll London, Dept Infect Dis Epidemiol, London W2 1PG, England
[5] Univ Oslo, Dept Biostat, N-0317 Oslo, Norway
基金
欧洲研究理事会; 英国惠康基金; 芬兰科学院;
关键词
population genomics; epistasis; linkage disequilibrium; DIRECT-COUPLING ANALYSIS; PROTEIN-STRUCTURE; STRUCTURE PREDICTION; MUTUAL INFORMATION; RESIDUE CONTACTS; SEQUENCE; IDENTIFICATION; MUTATIONS; EVOLUTION;
D O I
10.1099/mgen.0.000184
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10(4)-10(5) polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10(5) polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
引用
收藏
页数:12
相关论文
共 37 条
[1]   Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable [J].
Barton, John P. ;
Goonetilleke, Nilu ;
Butler, Thomas C. ;
Walker, Bruce D. ;
McMichael, Andrew J. ;
Chakraborty, Arup K. .
NATURE COMMUNICATIONS, 2016, 7
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]   Experimental Evolution of a Facultative Thermophile from a Mesophilic Ancestor [J].
Blaby, Ian K. ;
Lyons, Benjamin J. ;
Wroclawska-Hughes, Ewa ;
Phillips, Grier C. F. ;
Pyle, Tyler P. ;
Chamberlin, Stephen G. ;
Benner, Steven A. ;
Lyons, Thomas J. ;
de Crecy-Lagard, Valerie ;
de Crecy, Eudes .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2012, 78 (01) :144-155
[4]   The advent of genome-wide association studies for bacteria [J].
Chen, Peter E. ;
Shapiro, B. Jesse .
CURRENT OPINION IN MICROBIOLOGY, 2015, 25 :17-24
[5]   Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software [J].
Cheng, Lu ;
Connor, Thomas R. ;
Siren, Jukka ;
Aanensen, David M. ;
Corander, Jukka .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (05) :1224-1228
[6]   Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes [J].
Chewapreecha, Claire ;
Marttinen, Pekka ;
Croucher, Nicholas J. ;
Salter, Susannah J. ;
Harris, Simon R. ;
Mather, Alison E. ;
Hanage, William P. ;
Goldblatt, David ;
Nosten, Francois H. ;
Turner, Claudia ;
Turner, Paul ;
Bentley, Stephen D. ;
Parkhill, Julian .
PLOS GENETICS, 2014, 10 (08)
[7]   Dense genomic sampling identifies highways of pneumococcal recombination [J].
Chewapreecha, Claire ;
Harris, Simon R. ;
Croucher, Nicholas J. ;
Turner, Claudia ;
Marttinen, Pekka ;
Cheng, Lu ;
Pessia, Alberto ;
Aanensen, David M. ;
Mather, Alison E. ;
Page, Andrew J. ;
Salter, Susannah J. ;
Harris, David ;
Nosten, Francois ;
Goldblatt, David ;
Corander, Jukka ;
Parkhill, Julian ;
Turner, Paul ;
Bentley, Stephen D. .
NATURE GENETICS, 2014, 46 (03) :305-+
[8]   Population genomics of post-vaccine changes in pneumococcal epidemiology [J].
Croucher, Nicholas J. ;
Finkelstein, Jonathan A. ;
Pelton, Stephen I. ;
Mitchell, Patrick K. ;
Lee, Grace M. ;
Parkhill, Julian ;
Bentley, Stephen D. ;
Hanage, William P. ;
Lipsitch, Marc .
NATURE GENETICS, 2013, 45 (06) :656-+
[9]   Identification, variation and transcription of pneumococcal repeat sequences [J].
Croucher, Nicholas J. ;
Vernikos, Georgios S. ;
Parkhill, Julian ;
Bentley, Stephen D. .
BMC GENOMICS, 2011, 12
[10]   Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction [J].
De Leonardis, Eleonora ;
Lutz, Benjamin ;
Ratz, Sebastian ;
Cocco, Simona ;
Monasson, Remi ;
Schug, Alexander ;
Weigt, Martin .
NUCLEIC ACIDS RESEARCH, 2015, 43 (21) :10444-10455