Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome

被引:62
作者
Everett, M. V. [1 ]
Grau, E. D. [1 ]
Seeb, J. E. [1 ]
机构
[1] Univ Washington, Sch Aquat & Fishery Sci, Seattle, WA 98195 USA
基金
美国海洋和大气管理局;
关键词
EST; next-generation sequencing; SNP; sockeye salmon; SOLiD; transcriptome; NUCLEOTIDE POLYMORPHISM MARKERS; ATLANTIC SALMON; SINGLE; TROUT; DIFFERENTIATION; SELECTION; RESOURCE; RAINBOW;
D O I
10.1111/j.1755-0998.2010.02969.x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references.
引用
收藏
页码:93 / 108
页数:16
相关论文
共 43 条
[41]   Nucleotide divergence vs. gene expression differentiation: comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with the hooded crow [J].
Wolf, Jochen B. W. ;
Bayer, Till ;
Haubold, Bernhard ;
Schilhabel, Markus ;
Rosenstiel, Philip ;
Tautz, Diethard .
MOLECULAR ECOLOGY, 2010, 19 :162-175
[42]   BatchPrimer3: A high throughput web application for PCR and sequencing primer design [J].
You, Frank M. ;
Huo, Naxin ;
Gu, Yong Qiang ;
Luo, Ming-cheng ;
Ma, Yaqin ;
Hane, Dave ;
Lazo, Gerard R. ;
Dvorak, Jan ;
Anderson, Olin D. .
BMC BIOINFORMATICS, 2008, 9 (1)
[43]   Gene ontology analysis for RNA-seq: accounting for selection bias [J].
Young, Matthew D. ;
Wakefield, Matthew J. ;
Smyth, Gordon K. ;
Oshlack, Alicia .
GENOME BIOLOGY, 2010, 11 (02)