SNP markers retrieval for a non-model species: A practical approach

被引:23
作者
Arwa Shahin
Thomas van Gurp
Sander A Peters
Richard GF Visser
Jaap M van Tuyl
Paul Arens
机构
[1] Wageningen University and Research Centre, Plant Breeding, 6700 AJ Wageningen
[2] Wageningen University and Research Centre, Bioscience, 6700 AJ Wageningen
[3] Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, 6700 AB Wageningen
关键词
Assembly Quality; CAP3 Assembly; Transcriptome Size; Redundant Contigs; Illumina Golden Gate;
D O I
10.1186/1756-0500-5-79
中图分类号
学科分类号
摘要
Background: SNP (Single Nucleotide Polymorphism) markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is essential. Little is known about assembler's performance, especially when dealing with highly heterogeneous species that show a high genome complexity and what the possible consequences are of differences in assemblies on SNP retrieval. This study tested two assemblers (CAP3 and CLC) on 454 data from four lily genotypes and compared results with respect to SNP retrieval. Results: CAP3 assembly resulted in higher numbers of contigs, lower numbers of reads per contig, and shorter average read lengths compared to CLC. Blast comparisons showed that CAP3 contigs were highly redundant. Contrastingly, CLC in rare cases combined paralogs in one contig. Redundant and chimeric contigs may lead to erroneous SNPs. Filtering for redundancy can be done by blasting selected SNP markers to the contigs and discarding all the SNP markers that show more than one blast hit. Results on chimeric contigs showed that only four out of 2,421 SNP markers were selected from chimeric contigs. Conclusion: In practice, CLC performs better in assembling highly heterogeneous genome sequences compared to CAP3, and consequently SNP retrieval is more efficient. Additionally a simple flow scheme is suggested for SNP marker retrieval that can be valid for all non-model species. © 2012 Shahin et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 31 条
[1]  
Paszkiewicz K., Studholme D., De novo assembly of short sequence reads, Brief Bioinform, 11, 5, pp. 457-472, (2010)
[2]  
Papanicolaou A., Stierli R., Ffrench-Constant R., Heckel D., Next generation transcriptomes for next generation genomes using est2assembly, BMC Bioinformatics, 10, 1, (2009)
[3]  
Kumar S., Blaxter M., Comparing de novo assemblers for 454 transcriptome data, BMC Genomics, 11, 1, (2010)
[4]  
Palmieri N., Schlotterer C., Mapping accuracy of short reads from massively parallel sequencing and the implications for quantitative expression profiling, PLoS One, 4, 7, (2009)
[5]  
Zhang W., Chen J., Yang Y., Tang Y., Shang J., Shen B., A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies, PLoS One, 6, 3, (2011)
[6]  
Emrich S.J., Aluru S., Fu Y., Wen T.-J., Narayanan M., Guo L., Ashlock D.A., Schnable P.S., A strategy for assembling the maize (Zea mays L.) genome, Bioinformatics, 20, 2, pp. 140-147, (2004)
[7]  
Tang J., Vosman B., Voorrips R.E., Van Der Linden C.G., Leunissen J.A.M., QualitySNP: A pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species, BMC Bioinformatics, 7, (2006)
[8]  
Anithakumari A.M., Tang J., Van Eck H.J., Visser R.G., Leunissen J.A., Vosman B., Van Der Linden C.G., A pipeline for high throughput detection and mapping of SNPs from EST databases, Mol Breeding: New Strategies in Plant Improvement, 26, 1, pp. 65-75, (2010)
[9]  
Vera Ruiz E.M., Soriano J.M., Romero C., Zhebentyayeva T., Terol J., Zuriaga E., Llacer G., Abbott A.G., Badenes M.L., Narrowing down the apricot Plum pox virus resistance locus and comparative analysis with the peach genome syntenic region, Mol Plant Pathol, 12, 6, pp. 535-547, (2011)
[10]  
Rivarola M., Foster J.T., Chan A.P., Williams A.L., Rice D.W., Liu X., Melake-Berhan A., Creasy H.H., Puiu D., Rosovitz M.J., Castor Bean Organelle genome sequencing and worldwide genetic diversity analysis, PLoS ONE, 6, 7, (2011)