Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

被引:130
作者
Browning, Brian L. [1 ]
Browning, Sharon R. [2 ]
机构
[1] Univ Washington, Dept Med, Div Med Genet, Seattle, WA 98195 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
基金
英国惠康基金; 美国国家卫生研究院;
关键词
D O I
10.1016/j.ajhg.2013.09.014
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.
引用
收藏
页码:840 / 851
页数:12
相关论文
共 15 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]  
Bentley J., 1984, Communications of the ACM, V27, P865, DOI 10.1145/358234.381162
[3]   Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data [J].
Browning, Brian L. ;
Browning, Sharon R. .
GENETICS, 2013, 194 (02) :459-+
[4]   A Fast, Powerful Method for Detecting Identity by Descent [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 88 (02) :173-182
[5]   Identity by Descent Between Distant Relatives: Detection and Applications [J].
Browning, Sharon R. ;
Browning, Brian L. .
ANNUAL REVIEW OF GENETICS, VOL 46, 2012, 46 :617-633
[6]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[7]   Deep resequencing reveals excess rare recent variants consistent with explosive population growth [J].
Coventry, Alex ;
Bull-Otterson, Lara M. ;
Liu, Xiaoming ;
Clark, Andrew G. ;
Maxwell, Taylor J. ;
Crosby, Jacy ;
Hixson, James E. ;
Rea, Thomas J. ;
Muzny, Donna M. ;
Lewis, Lora R. ;
Wheeler, David A. ;
Sabo, Aniko ;
Lusk, Christine ;
Weiss, Kenneth G. ;
Akbar, Humeira ;
Cree, Andrew ;
Hawes, Alicia C. ;
Newsham, Irene ;
Varghese, Robin T. ;
Villasana, Donna ;
Gross, Shannon ;
Joshi, Vandita ;
Santibanez, Jireh ;
Morgan, Margaret ;
Chang, Kyle ;
Hale, Walker ;
Templeton, Alan R. ;
Boerwinkle, Eric ;
Gibbs, Richard ;
Sing, Charles F. .
NATURE COMMUNICATIONS, 2010, 1
[8]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[9]   Whole population, genome-wide mapping of hidden relatedness [J].
Gusev, Alexander ;
Lowe, Jennifer K. ;
Stoffel, Markus ;
Daly, Mark J. ;
Altshuler, David ;
Breslow, Jan L. ;
Friedman, Jeffrey M. ;
Pe'er, Itsik .
GENOME RESEARCH, 2009, 19 (02) :318-326
[10]   A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies [J].
Howie, Bryan N. ;
Donnelly, Peter ;
Marchini, Jonathan .
PLOS GENETICS, 2009, 5 (06)