Highly accurate long-read HiFi sequencing data for five complex genomes

被引:198
作者
Hon, Ting [1 ]
Mars, Kristin [1 ]
Young, Greg [1 ]
Tsai, Yu-Chih [1 ]
Karalius, Joseph W. [1 ]
Landolin, Jane M. [2 ]
Maurer, Nicholas [3 ]
Kudrna, David [4 ,5 ]
Hardigan, Michael A. [6 ]
Steiner, Cynthia C. [7 ]
Knapp, Steven J. [6 ]
Ware, Doreen [8 ,9 ]
Shapiro, Beth [3 ,10 ]
Peluso, Paul [1 ]
Rank, David R. [1 ]
机构
[1] Pacific Biosci Calif Inc, 1305 OBrien Dr, Menlo Pk, CA 94025 USA
[2] Ravel Biotechnol Inc, 953 Indiana St, San Francisco, CA 94107 USA
[3] Univ Calif Santa Cruz, Dept Ecol & Evolutionary Biol, Santa Cruz, CA 95064 USA
[4] Univ Arizona, Arizona Genom Inst, Tucson, AZ 85721 USA
[5] Univ Arizona, Sch Plant Sci, Tucson, AZ 85721 USA
[6] Univ Calif Davis, Dept Plant Sci, One Shields Ave, Davis, CA 95616 USA
[7] San Diego Zoo Global, Beckman Ctr Conservat Res, Conservat Genet, 15600 San Pasqual Valley Rd, Escondido, CA 92027 USA
[8] Cold Spring Harbor Lab, POB 100, Cold Spring Harbor, NY 11724 USA
[9] USDA ARS, Plant Soil & Nutr Res Unit, Ithaca, NY 14853 USA
[10] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
基金
美国食品与农业研究所;
关键词
SINGLE; CAPTURE;
D O I
10.1038/s41597-020-00743-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The PacBio(R) HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria x ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.
引用
收藏
页数:11
相关论文
共 40 条
[1]  
[Anonymous], 2020, PROCEDURE CHECKLIST
[2]  
[Anonymous], 2020, PACBIO SMRT LINK
[3]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[4]  
Cartolano M., 2016, PLOS ONE, V11
[5]   Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications [J].
Chen, Xiaoyu ;
Schulz-Trieglaff, Ole ;
Shaw, Richard ;
Barnes, Bret ;
Schlesinger, Felix ;
Kallberg, Morten ;
Cox, Anthony J. ;
Kruglyakl, Semyon ;
Saunders, Christopher T. .
BIOINFORMATICS, 2016, 32 (08) :1220-1222
[6]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[7]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[8]   Non-invasive prenatal diagnosis of achondroplasia and thanatophoric dysplasia: next-generation sequencing allows for a safer, more accurate, and comprehensive approach [J].
Chitty, Lyn S. ;
Mason, Sarah ;
Barrett, Angela N. ;
McKay, Fiona ;
Lench, Nicholas ;
Daley, Rebecca ;
Jenkins, Lucy A. .
PRENATAL DIAGNOSIS, 2015, 35 (07) :656-662
[9]   Stem cell transcriptome profiling via massive-scale mRNA sequencing [J].
Cloonan, Nicole ;
Forrest, Alistair R. R. ;
Kolle, Gabriel ;
Gardiner, Brooke B. A. ;
Faulkner, Geoffrey J. ;
Brown, Mellissa K. ;
Taylor, Darrin F. ;
Steptoe, Anita L. ;
Wani, Shivangi ;
Bethel, Graeme ;
Robertson, Alan J. ;
Perkins, Andrew C. ;
Bruce, Stephen J. ;
Lee, Clarence C. ;
Ranade, Swati S. ;
Peckham, Heather E. ;
Manning, Jonathan M. ;
McKernan, Kevin J. ;
Grimmond, Sean M. .
NATURE METHODS, 2008, 5 (07) :613-619
[10]  
Doyle J.L.D., 1987, Phytochem. Bull.