An Improved Protocol for Sequencing of Repetitive Genomic Regions and Structural Variations Using Mutagenesis and Next Generation Sequencing

被引：5

作者：

Sipos, Botond ^{[1
]}

Massingham, Tim ^{[1
]}

Stuetz, Adrian M. ^{[2
]}

Goldman, Nick ^{[1
]}

机构：

[1] European Bioinformat Inst EMBL EBI, Cambridge, England

[2] European Mol Biol Lab, Genome Biol Res Unit, Heidelberg, Germany

来源：

PLOS ONE | 2012年 / 7卷 / 08期

关键词：

EVOLUTION; ALGORITHMS; ALIGNMENT; REPEATS; VELVET; READS;

D O I：

10.1371/journal.pone.0043359

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e. g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

引用

页数：9

共 33 条

[1] Primate segmental duplications: crucibles of evolution, diversity and disease
Bailey, Jeffrey A.
Eichler, Evan E.
[J]. NATURE REVIEWS GENETICS, 2006, 7 (07) : 552 - 564
[2] Genomics: catch me if you can
Blow, Nathan
[J]. NATURE METHODS, 2009, 6 (07) : 539 - 542
[3] Finishing the euchromatic sequence of the human genome
Collins, FS
Lander, ES
Rogers, J
Waterston, RH
[J]. NATURE, 2004, 431 (7011) : 931 - 945
[4] MUSCLE: a multiple sequence alignment method with reduced time and space complexity
Edgar, RC
[J]. BMC BIOINFORMATICS, 2004, 5 (1) : 1 - 19
[5] Flicek P, 2009, NAT METHODS, V6, pS6, DOI [10.1038/NMETH.1376, 10.1038/nmeth.1376]
[6] Closing gaps in the human genome using sequencing by synthesis
Garber, Manuel
Zody, Michael C.
Arachchi, Harindra M.
Berlin, Aaron
Gnerre, Sante
Green, Lisa M.
Lennon, Niall
Nusbaum, Chad
[J]. GENOME BIOLOGY, 2009, 10 (06):
[7] Algorithms for sequence analysis via mutagenesis
Keith, JM
Adams, P
Bryant, D
Cochran, DAE
Lala, GH
Mitchelson, KR
[J]. BIOINFORMATICS, 2004, 20 (15) : 2401 - 2410
[8] Unlocking hidden genomic sequence
Keith, JM
Cochran, DAE
Lala, GH
Adams, P
Bryant, D
Mitchelson, KR
[J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (03) : e35
[9] A simulated annealing algorithm for finding consensus sequences
Keith, JM
Adams, P
Bryant, D
Kroese, DP
Mitchelson, KR
Cochran, DAE
Lala, GH
[J]. BIOINFORMATICS, 2002, 18 (11) : 1494 - 1499
[10] Keith JM, 2007, PERSP BIOANAL, V2, P303, DOI 10.1016/S1871-0069(06)02010-6

← 1 2 3 4 →