Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome

被引:62
|
作者
Everett, M. V. [1 ]
Grau, E. D. [1 ]
Seeb, J. E. [1 ]
机构
[1] Univ Washington, Sch Aquat & Fishery Sci, Seattle, WA 98195 USA
基金
美国海洋和大气管理局;
关键词
EST; next-generation sequencing; SNP; sockeye salmon; SOLiD; transcriptome; NUCLEOTIDE POLYMORPHISM MARKERS; ATLANTIC SALMON; SINGLE; TROUT; DIFFERENTIATION; SELECTION; RESOURCE; RAINBOW;
D O I
10.1111/j.1755-0998.2010.02969.x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references.
引用
收藏
页码:93 / 108
页数:16
相关论文
共 6 条
  • [1] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Matteo Comin
    Michele Schimd
    BMC Bioinformatics, 15
  • [2] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Comin, Matteo
    Schimd, Michele
    BMC BIOINFORMATICS, 2014, 15
  • [3] High-throughput SNP discovery in the rabbit (Oryctolagus cuniculus) genome by next-generation semiconductor-based sequencing
    Bertolini, F.
    Schiavo, G.
    Scotti, E.
    Ribani, A.
    Martelli, P. L.
    Casadio, R.
    Fontanesi, L.
    ANIMAL GENETICS, 2014, 45 (02) : 304 - 307
  • [4] A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes
    Haridas, Sajeet
    Breuill, Colette
    Bohlmann, Joerg
    Hsiang, Tom
    JOURNAL OF MICROBIOLOGICAL METHODS, 2011, 86 (03) : 368 - 375
  • [5] COVERAGE-BASED CONSENSUS CALLING (CBCC) OF SHORT SEQUENCE READS AND COMPARISON OF CBCC RESULTS TO IDENTIFY SNPs IN CHICKPEA (CICER ARIETINUM; FABACEAE), A CROP SPECIES WITHOUT A REFERENCE GENOME
    Azam, Sarwar
    Thakur, Vivek
    Ruperao, Pradeep
    Shah, Trushar
    Balaji, Jayashree
    Amindala, BhanuPrakash
    Farmer, Andrew D.
    Studholme, David J.
    May, Gregory D.
    Edwards, David
    Jones, Jonathan D. G.
    Varshney, Rajeev K.
    AMERICAN JOURNAL OF BOTANY, 2012, 99 (02) : 186 - 192
  • [6] Expression Profiling without Genome Sequence Information in a Non-Model Species, Pandalid Shrimp (Pandalus latirostris), by Next-Generation Sequencing
    Kawahara-Miki, Ryouka
    Wada, Kenta
    Azuma, Noriko
    Chiba, Susumu
    PLOS ONE, 2011, 6 (10):