A reference human genome dataset of the BGISEQ-500 sequencer

被引:223
作者
Huang, Jie [1 ]
Liang, Xinming [2 ]
Xuan, Yuankai [3 ]
Geng, Chunyu [2 ]
Li, Yuxiang [2 ]
Lu, Haorong [2 ]
Qu, Shoufang [1 ]
Mei, Xianglin [3 ]
Chen, Hongbo [1 ]
Yu, Ting [1 ]
Sun, Nan [1 ]
Rao, Junhua [2 ]
Wang, Jiahao [4 ]
Zhang, Wenwei [2 ]
Chen, Ying [2 ]
Liao, Sha [2 ]
Jiang, Hui [2 ]
Liu, Xin [2 ]
Yang, Zhaopeng [1 ]
Mu, Feng [2 ]
Gao, Shangxian [1 ]
机构
[1] NIFDC, 2 Tiantan Xili Dongcheng Dist, Beijing 10050, Peoples R China
[2] BGI Shenzhen, Shenzhen 518083, Guangdong, Peoples R China
[3] State Food & Drug Adm, Hubei Ctr Med Equipment Qual Supervis & Testing, 24-9 Zhongbei East Rd, Wuhan 430000, Hubei Province, Peoples R China
[4] BGI Qingdao, Tuanjie Rd, Qingdao 266555, Shandong, Peoples R China
来源
GIGASCIENCE | 2017年 / 6卷 / 05期
关键词
genomics; sequencing; next-generation sequencing; BGISEQ-500; FRAMEWORK; GENOTYPE; SNP;
D O I
10.1093/gigascience/gix024
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete GenomicsTM sequencing technologies, it generates short reads at a large scale. Findings: Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. Conclusions: We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [ FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 19 条
[1]  
Andrews S, 2010, FASTQC QUALITY CONTR
[2]  
[Anonymous], 2017, HIGH CONF VCF BED DA
[3]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[4]   Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays [J].
Drmanac, Radoje ;
Sparks, Andrew B. ;
Callow, Matthew J. ;
Halpern, Aaron L. ;
Burns, Norman L. ;
Kermani, Bahram G. ;
Carnevali, Paolo ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Yeung, George ;
Dahl, Fredrik ;
Fernandez, Andres ;
Staker, Bryan ;
Pant, Krishna P. ;
Baccash, Jonathan ;
Borcherding, Adam P. ;
Brownley, Anushka ;
Cedeno, Ryan ;
Chen, Linsu ;
Chernikoff, Dan ;
Cheung, Alex ;
Chirita, Razvan ;
Curson, Benjamin ;
Ebert, Jessica C. ;
Hacker, Coleen R. ;
Hartlage, Robert ;
Hauser, Brian ;
Huang, Steve ;
Jiang, Yuan ;
Karpinchyk, Vitali ;
Koenig, Mark ;
Kong, Calvin ;
Landers, Tom ;
Le, Catherine ;
Liu, Jia ;
McBride, Celeste E. ;
Morenzoni, Matt ;
Morey, Robert E. ;
Mutch, Karl ;
Perazich, Helena ;
Perry, Kimberly ;
Peters, Brock A. ;
Peterson, Joe ;
Pethiyagoda, Charit L. ;
Pothuraju, Kaliprasad ;
Richter, Claudia ;
Rosenbaum, Abraham M. ;
Roy, Shaunak ;
Shafto, Jay ;
Sharanhovich, Uladzislau .
SCIENCE, 2010, 327 (5961) :78-81
[5]   Coming of age: ten years of next-generation sequencing technologies [J].
Goodwin, Sara ;
McPherson, John D. ;
McCombie, W. Richard .
NATURE REVIEWS GENETICS, 2016, 17 (06) :333-351
[6]  
Huang J, 2017, GIGASCIENCE DATABASE
[7]  
Huang J, 2016, GIGASCIENCE DATABASE
[8]   Systematic comparison of variant calling pipelines using gold standard personal exome variants [J].
Hwang, Sohyun ;
Kim, Eiru ;
Lee, Insuk ;
Marcotte, Edward M. .
SCIENTIFIC REPORTS, 2015, 5
[9]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[10]   Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data [J].
Liu, Qi ;
Guo, Yan ;
Li, Jiang ;
Long, Jirong ;
Zhang, Bing ;
Shyr, Yu .
BMC GENOMICS, 2012, 13