Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

被引:86
作者
Ilut, Daniel C. [1 ]
Nydam, Marie L. [2 ]
Hare, Matthew P. [3 ]
机构
[1] Cornell Univ, Dept Genet & Plant Breeding, Ithaca, NY 14850 USA
[2] Ctr Coll Danville, Div Sci & Math, Danville, KY 40422 USA
[3] Cornell Univ, Dept Nat Resources, Ithaca, NY 14850 USA
关键词
SNP DISCOVERY; SEQUENCE; GENES;
D O I
10.1155/2014/675158
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (similar to 0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data.
引用
收藏
页数:9
相关论文
共 29 条
[1]   CPG ISLANDS, GENES AND ISOCHORES IN THE GENOMES OF VERTEBRATES [J].
AISSANI, B ;
BERNARDI, G .
GENE, 1991, 106 (02) :185-195
[2]   Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers [J].
Baird, Nathan A. ;
Etter, Paul D. ;
Atwood, Tressa S. ;
Currey, Mark C. ;
Shiver, Anthony L. ;
Lewis, Zachary A. ;
Selker, Eric U. ;
Cresko, William A. ;
Johnson, Eric A. .
PLOS ONE, 2008, 3 (10)
[3]   SEED: efficient clustering of next-generation sequences [J].
Bao, Ergude ;
Jiang, Tao ;
Kaloshian, Isgouhi ;
Girke, Thomas .
BIOINFORMATICS, 2011, 27 (18) :2502-2509
[4]   Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences [J].
Catchen, Julian M. ;
Amores, Angel ;
Hohenlohe, Paul ;
Cresko, William ;
Postlethwait, John H. .
G3-GENES GENOMES GENETICS, 2011, 1 (03) :171-182
[5]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[6]   Special features of RAD Sequencing data: implications for genotyping [J].
Davey, John W. ;
Cezard, Timothee ;
Fuentes-Utrilla, Pablo ;
Eland, Cathlene ;
Gharbi, Karim ;
Blaxter, Mark L. .
MOLECULAR ECOLOGY, 2013, 22 (11) :3151-3164
[7]   Genome-wide genetic marker discovery and genotyping using next-generation sequencing [J].
Davey, John W. ;
Hohenlohe, Paul A. ;
Etter, Paul D. ;
Boone, Jason Q. ;
Catchen, Julian M. ;
Blaxter, Mark L. .
NATURE REVIEWS GENETICS, 2011, 12 (07) :499-510
[8]   Fast Computation and Applications of Genome Mappability [J].
Derrien, Thomas ;
Estelle, Jordi ;
Marco Sola, Santiago ;
Knowles, David G. ;
Raineri, Emanuele ;
Guigo, Roderic ;
Ribeca, Paolo .
PLOS ONE, 2012, 7 (01)
[9]   A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species [J].
Elshire, Robert J. ;
Glaubitz, Jeffrey C. ;
Sun, Qi ;
Poland, Jesse A. ;
Kawamoto, Ken ;
Buckler, Edward S. ;
Mitchell, Sharon E. .
PLOS ONE, 2011, 6 (05)
[10]   Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads [J].
Etter, Paul D. ;
Preston, Jessica L. ;
Bassham, Susan ;
Cresko, William A. ;
Johnson, Eric A. .
PLOS ONE, 2011, 6 (04)