A synthetic-diploid benchmark for accurate variant-calling evaluation

被引:129
作者
Li, Heng [1 ]
Bloom, Jonathan M. [1 ]
Farjoun, Yossi [1 ]
Fleharty, Mark [1 ]
Gauthier, Laura [1 ]
Neale, Benjamin [1 ,2 ]
MacArthur, Daniel [1 ,2 ]
机构
[1] Broad Inst Harvard & MIT, Cambridge, MA 02142 USA
[2] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
基金
美国国家卫生研究院;
关键词
GENOME ASSEMBLIES; DISCOVERY; FRAMEWORK; SNP;
D O I
10.1038/s41592-018-0054-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
引用
收藏
页码:595 / +
页数:6
相关论文
共 21 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]  
[Anonymous], 2015, bioRxiv
[3]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[4]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[5]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[6]   A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree [J].
Eberle, Michael A. ;
Fritzilas, Epameinondas ;
Krusche, Peter ;
Kallberg, Morten ;
Moore, Benjamin L. ;
Bekritsky, Mitchell A. ;
Iqbal, Zamin ;
Chuang, Han-Yu ;
Humphray, Sean J. ;
Halpern, Aaron L. ;
Kruglyak, Semyon ;
Margulies, Elliott H. ;
McVean, Gil ;
Bentley, David R. .
GENOME RESEARCH, 2017, 27 (01) :157-164
[7]  
Garrison E., 2012, arXiv, V1207, P3907, DOI [10.48550/arXiv.1207.3907, DOI 10.48550/ARXIV.1207.3907]
[8]   Discovery and genotyping of structural variation from long-read haploid genome sequence data [J].
Huddleston, John ;
Chaisson, Mark J. P. ;
Steinberg, Karyn Meltz ;
Warren, Wes ;
Hoekzema, Kendra ;
Gordon, David ;
Graves-Lindsay, Tina A. ;
Munson, Katherine M. ;
Kronenberg, Zev N. ;
Vives, Laura ;
Peluso, Paul ;
Boitano, Matthew ;
Chin, Chen-Shin ;
Korlach, Jonas ;
Wilson, Richard K. ;
Eichler, Evan E. .
GENOME RESEARCH, 2017, 27 (05) :677-685
[9]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[10]  
Li H, 2013, PREPRINT, P3997, DOI DOI 10.48550/ARXIV.1303.3997