Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

被引:86
作者
Ilut, Daniel C. [1 ]
Nydam, Marie L. [2 ]
Hare, Matthew P. [3 ]
机构
[1] Cornell Univ, Dept Genet & Plant Breeding, Ithaca, NY 14850 USA
[2] Ctr Coll Danville, Div Sci & Math, Danville, KY 40422 USA
[3] Cornell Univ, Dept Nat Resources, Ithaca, NY 14850 USA
关键词
SNP DISCOVERY; SEQUENCE; GENES;
D O I
10.1155/2014/675158
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (similar to 0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data.
引用
收藏
页数:9
相关论文
共 29 条
[21]   Genome sequence of the palaeopolyploid soybean [J].
Schmutz, Jeremy ;
Cannon, Steven B. ;
Schlueter, Jessica ;
Ma, Jianxin ;
Mitros, Therese ;
Nelson, William ;
Hyten, David L. ;
Song, Qijian ;
Thelen, Jay J. ;
Cheng, Jianlin ;
Xu, Dong ;
Hellsten, Uffe ;
May, Gregory D. ;
Yu, Yeisoo ;
Sakurai, Tetsuya ;
Umezawa, Taishi ;
Bhattacharyya, Madan K. ;
Sandhu, Devinder ;
Valliyodan, Babu ;
Lindquist, Erika ;
Peto, Myron ;
Grant, David ;
Shu, Shengqiang ;
Goodstein, David ;
Barry, Kerrie ;
Futrell-Griggs, Montona ;
Abernathy, Brian ;
Du, Jianchang ;
Tian, Zhixi ;
Zhu, Liucun ;
Gill, Navdeep ;
Joshi, Trupti ;
Libault, Marc ;
Sethuraman, Anand ;
Zhang, Xue-Cheng ;
Shinozaki, Kazuo ;
Nguyen, Henry T. ;
Wing, Rod A. ;
Cregan, Perry ;
Specht, James ;
Grimwood, Jane ;
Rokhsar, Dan ;
Stacey, Gary ;
Shoemaker, Randy C. ;
Jackson, Scott A. .
NATURE, 2010, 463 (7278) :178-183
[22]   SlideSort: all pairs similarity search for short reads [J].
Shimizu, Kana ;
Tsuda, Koji .
BIOINFORMATICS, 2011, 27 (04) :464-470
[23]   Paleopolyploidy and gene duplication in soybean and other legumes [J].
Shoemaker, RC ;
Schlueter, J ;
Doyle, JJ .
CURRENT OPINION IN PLANT BIOLOGY, 2006, 9 (02) :104-109
[24]   A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome [J].
Small, Kerrin S. ;
Brudno, Michael ;
Hill, Matthew M. ;
Sidow, Arend .
GENOME BIOLOGY, 2007, 8 (03)
[25]   THE DISTRIBUTION OF GENES ON CHROMOSOMES - A CYTOLOGICAL APPROACH [J].
SUMNER, AT ;
DELATORRE, J ;
STUPPIA, L .
JOURNAL OF MOLECULAR EVOLUTION, 1993, 37 (02) :117-122
[26]   Close split of sorghum and maize genome progenitors [J].
Swigonová, Z ;
Lai, JS ;
Ma, JX ;
Ramakrishna, W ;
Llaca, V ;
Bennetzen, JL ;
Messing, J .
GENOME RESEARCH, 2004, 14 (10A) :1916-1923
[27]   ESTIMATION OF DNA-SEQUENCE DIVERGENCE FROM COMPARISON OF RESTRICTION ENDONUCLEASE DIGESTS [J].
UPHOLT, WB .
NUCLEIC ACIDS RESEARCH, 1977, 4 (05) :1257-1265
[28]   SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries [J].
Van Tassell, Curtis P. ;
Smith, Timothy P. L. ;
Matukumalli, Lakshmi K. ;
Taylor, Jeremy F. ;
Schnabel, Robert D. ;
Lawley, Cynthia Taylor ;
Haudenschild, Christian D. ;
Moore, Stephen S. ;
Warren, Wesley C. ;
Sonstegard, Tad S. .
NATURE METHODS, 2008, 5 (03) :247-252
[29]   The draft genome of a diploid cotton Gossypium raimondii [J].
Wang, Kunbo ;
Wang, Zhiwen ;
Li, Fuguang ;
Ye, Wuwei ;
Wang, Junyi ;
Song, Guoli ;
Yue, Zhen ;
Cong, Lin ;
Shang, Haihong ;
Zhu, Shilin ;
Zou, Changsong ;
Li, Qin ;
Yuan, Youlu ;
Lu, Cairui ;
Wei, Hengling ;
Gou, Caiyun ;
Zheng, Zequn ;
Yin, Ye ;
Zhang, Xueyan ;
Liu, Kun ;
Wang, Bo ;
Song, Chi ;
Shi, Nan ;
Kohel, Russell J. ;
Percy, Richard G. ;
Yu, John Z. ;
Zhu, Yu-Xian ;
Wang, Jun ;
Yu, Shuxun .
NATURE GENETICS, 2012, 44 (10) :1098-+