Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome

被引：62

作者：

Everett, M. V. ^{[1
]}

Grau, E. D. ^{[1
]}

Seeb, J. E. ^{[1
]}

机构：

[1] Univ Washington, Sch Aquat & Fishery Sci, Seattle, WA 98195 USA

来源：

MOLECULAR ECOLOGY RESOURCES | 2011年 / 11卷

基金：

美国海洋和大气管理局;

关键词：

EST; next-generation sequencing; SNP; sockeye salmon; SOLiD; transcriptome; NUCLEOTIDE POLYMORPHISM MARKERS; ATLANTIC SALMON; SINGLE; TROUT; DIFFERENTIATION; SELECTION; RESOURCE; RAINBOW;

D O I：

10.1111/j.1755-0998.2010.02969.x

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references.

引用

页码：93 / 108

页数：16

共 43 条

[1]

Allendorf F.W., 1984, P1

[2] Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers [J].

Baird, Nathan A. ;

Etter, Paul D. ;

Atwood, Tressa S. ;

Currey, Mark C. ;

Shiver, Anthony L. ;

Lewis, Zachary A. ;

Selker, Eric U. ;

Cresko, William A. ;

Johnson, Eric A. .

PLOS ONE, 2008, 3 (10)

[3] Normalization and subtraction: Two approaches to facilitate gene discovery [J].

Bonaldo, MDF ;

Lennon, G ;

Soares, MB .

GENOME RESEARCH, 1996, 6 (09) :791-806

[4] Characterization of duplicate gene evolution in the recent natural allopolyploid Tragopogon miscellus by next-generation sequencing and Sequenom iPLEX MassARRAY genotyping [J].

Buggs, Richard J. A. ;

Chamala, Srikar ;

Wu, Wei ;

Gao, Lu ;

May, Gregory D. ;

Schnable, Patrick S. ;

Soltis, Douglas E. ;

Soltis, Pamela S. ;

Barbazuk, W. Brad .

MOLECULAR ECOLOGY, 2010, 19 :132-146

[5] Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant species using 454 sequencing [J].

Bundock, Peter C. ;

Eliott, Frances G. ;

Ablett, Gary ;

Benson, Adam D. ;

Casu, Rosanne E. ;

Aitken, Karen S. ;

Henry, Robert J. .

PLANT BIOTECHNOLOGY JOURNAL, 2009, 7 (04) :347-354

[6] Chicken genomics resource: sequencing and annotation of 35,407 ESTs from single and multiple tissue cDNA libraries and CAP3 assembly of a chicken gene index [J].

Carre, Wilfrid ;

Wang, Xiaofei ;

Porter, Tom E. ;

Nys, Yves ;

Tang, Jianshan ;

Bernberg, Erin ;

Morgan, Robin ;

Burnside, Joan ;

Aggrey, Samuel E. ;

Simon, Jean ;

Cogburn, Larry A. .

PHYSIOLOGICAL GENOMICS, 2006, 25 (03) :514-524

[7]

Collins LJ, 2008, GENOME INFORM SER, V21, P3

[8] Thirty-two single nucleotide polymorphism markers for high-throughput genotyping of sockeye salmon [J].

Elfstrom, Carita M. ;

Smith, Christian T. ;

Seeb, James E. .

MOLECULAR ECOLOGY NOTES, 2006, 6 (04) :1255-1259

[9] Sequencing goes 454 and takes large-scale genomics into the wild [J].

Ellegren, Hans .

MOLECULAR ECOLOGY, 2008, 17 (07) :1629-1631

[10]

Flicek P, 2009, NAT METHODS, V6, pS6, DOI [10.1038/NMETH.1376, 10.1038/nmeth.1376]

← 1 2 3 4 5 →