Detection of recombination events in bacterial genomes from large population samples

被引:147
作者
Marttinen, Pekka [1 ]
Hanage, William P. [2 ]
Croucher, Nicholas J. [3 ]
Connor, Thomas R. [3 ]
Harris, Simon R. [3 ]
Bentley, Stephen D. [3 ]
Corander, Jukka [4 ,5 ]
机构
[1] Aalto Univ, Dept Biomed Engn & Computat Sci BECS, FI-00076 Aalto, Finland
[2] Harvard Univ, Sch Publ Hlth, Ctr Communicable Dis Dynam, Boston, MA 02115 USA
[3] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[4] Abo Akad Univ, Dept Math, FI-20500 Turku, Finland
[5] Univ Helsinki, Dept Math & Stat, FI-00014 Helsinki, Finland
基金
芬兰科学院;
关键词
DNA-SEQUENCE ALIGNMENTS; INTERSPECIFIC RECOMBINATION; GENE-TRANSFER; INFERENCE; SIMULATION; EVOLUTION;
D O I
10.1093/nar/gkr928
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Analysis of important human pathogen populations is currently under transition toward whole-genome sequencing of growing numbers of samples collected on a global scale. Since recombination in bacteria is often an important factor shaping their evolution by enabling resistance elements and virulence traits to rapidly transfer from one evolutionary lineage to another, it is highly beneficial to have access to tools that can detect recombination events. Multiple advanced statistical methods exist for such purposes; however, they are typically limited either to only a few samples or to data from relatively short regions of a total genome. By harnessing the power of recent advances in Bayesian modeling techniques, we introduce here a method for detecting homologous recombination events from whole-genome sequence data for bacterial population samples on a large scale. Our statistical approach can efficiently handle hundreds of whole genome sequenced population samples and identify separate origins of the recombinant sequence, offering an enhanced insight into the diversification of bacterial clones at the level of the whole genome. A data set of 241 whole genome sequences from an important pandemic lineage of Streptococcus pneumoniae is used together with multiple simulated data sets to demonstrate the potential of our approach.
引用
收藏
页数:12
相关论文
共 41 条
  • [1] [Anonymous], 1980, MULTIVARIATE ANAL
  • [2] [Anonymous], 2004, Learning Bayesian Networks
  • [3] [Anonymous], 2006, PATTERN RECOGN
  • [4] Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography
    Arenas, Miguel
    Posada, David
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [5] A tutorial on statistical methods for population association studies
    Balding, David J.
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (10) : 781 - 791
  • [6] Genetic analysis of the capsular biosynthetic locus from all 90 pneumococcal serotypes
    Bentley, Stephen D.
    Aanensen, David M.
    Mavroidi, Angeliki
    Saunders, David
    Rabbinowitsch, Ester
    Collins, Matthew
    Donohoe, Kathy
    Harris, David
    Murphy, Lee
    Quail, Michael A.
    Samuel, Gabby
    Skovsted, Ian C.
    Kaltoft, Margit Staum
    Barrell, Bart
    Reeves, Peter R.
    Parkhill, Julian
    Spratt, Brian G.
    [J]. PLOS GENETICS, 2006, 2 (03): : 262 - 269
  • [7] Detecting recombination in evolving nucleotide sequences
    Chan, Cheong Xin
    Beiko, Robert G.
    Ragan, Mark A.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] A systematics for discovering the fundamental units of bacterial diversity
    Cohan, Frederick M.
    Perry, Elizabeth B.
    [J]. CURRENT BIOLOGY, 2007, 17 (10) : R373 - R386
  • [9] Bayesian identification of admixture events using multilocus molecular markers
    Corander, Jukka
    Marttinen, Pekka
    [J]. MOLECULAR ECOLOGY, 2006, 15 (10) : 2833 - 2843
  • [10] Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations
    Corander, Jukka
    Marttinen, Pekka
    Siren, Jukka
    Tang, Jing
    [J]. BMC BIOINFORMATICS, 2008, 9 (1) : 539