An Improved Protocol for Sequencing of Repetitive Genomic Regions and Structural Variations Using Mutagenesis and Next Generation Sequencing

被引:5
作者
Sipos, Botond [1 ]
Massingham, Tim [1 ]
Stuetz, Adrian M. [2 ]
Goldman, Nick [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, Cambridge, England
[2] European Mol Biol Lab, Genome Biol Res Unit, Heidelberg, Germany
来源
PLOS ONE | 2012年 / 7卷 / 08期
关键词
EVOLUTION; ALGORITHMS; ALIGNMENT; REPEATS; VELVET; READS;
D O I
10.1371/journal.pone.0043359
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e. g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.
引用
收藏
页数:9
相关论文
共 33 条
  • [1] Primate segmental duplications: crucibles of evolution, diversity and disease
    Bailey, Jeffrey A.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (07) : 552 - 564
  • [2] Genomics: catch me if you can
    Blow, Nathan
    [J]. NATURE METHODS, 2009, 6 (07) : 539 - 542
  • [3] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945
  • [4] MUSCLE: a multiple sequence alignment method with reduced time and space complexity
    Edgar, RC
    [J]. BMC BIOINFORMATICS, 2004, 5 (1) : 1 - 19
  • [5] Flicek P, 2009, NAT METHODS, V6, pS6, DOI [10.1038/NMETH.1376, 10.1038/nmeth.1376]
  • [6] Closing gaps in the human genome using sequencing by synthesis
    Garber, Manuel
    Zody, Michael C.
    Arachchi, Harindra M.
    Berlin, Aaron
    Gnerre, Sante
    Green, Lisa M.
    Lennon, Niall
    Nusbaum, Chad
    [J]. GENOME BIOLOGY, 2009, 10 (06):
  • [7] Algorithms for sequence analysis via mutagenesis
    Keith, JM
    Adams, P
    Bryant, D
    Cochran, DAE
    Lala, GH
    Mitchelson, KR
    [J]. BIOINFORMATICS, 2004, 20 (15) : 2401 - 2410
  • [8] Unlocking hidden genomic sequence
    Keith, JM
    Cochran, DAE
    Lala, GH
    Adams, P
    Bryant, D
    Mitchelson, KR
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (03) : e35
  • [9] A simulated annealing algorithm for finding consensus sequences
    Keith, JM
    Adams, P
    Bryant, D
    Kroese, DP
    Mitchelson, KR
    Cochran, DAE
    Lala, GH
    [J]. BIOINFORMATICS, 2002, 18 (11) : 1494 - 1499
  • [10] Keith JM, 2007, PERSP BIOANAL, V2, P303, DOI 10.1016/S1871-0069(06)02010-6