De novo assembly of haplotype-resolved genomes with trio binning

被引:319
作者
Koren, Sergey [1 ]
Rhie, Arang [1 ]
Walenz, Brian P. [1 ]
Dilthey, Alexander T. [1 ,2 ]
Bickhart, Derek M. [3 ]
Kingan, Sarah B. [4 ]
Hiendleder, Stefan [5 ,6 ]
Williams, John L. [5 ]
Smith, Timothy P. L. [7 ]
Phillippy, Adam M. [1 ]
机构
[1] Natl Human Genome Res Inst, Computat & Stat Genom Branch, Genome Informat Sect, Bethesda, MD 20892 USA
[2] Heinrich Heine Univ Dusseldorf, Inst Med Microbiol, Dusseldorf, North Rhine Wes, Germany
[3] ARS USDA, Cell Wall Biol & Utilizat Lab, Madison, WI USA
[4] Pacific Biosci, Menlo Pk, CA USA
[5] Univ Adelaide, Davies Res Ctr, Sch Anim & Vet Sci, Roseworthy, SA, Australia
[6] Univ Adelaide, Robinson Res Inst, Adelaide, SA, Australia
[7] ARS USDA, US Meat Anim Res Ctr, Clay Ctr, NE 68933 USA
基金
美国国家卫生研究院;
关键词
VARIANTS; SEQUENCE; TOOL;
D O I
10.1038/nbt.4277
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.
引用
收藏
页码:1174 / +
页数:11
相关论文
共 56 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]  
[Anonymous], BIORXIV
[3]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[4]   Genome Sequence and Assembly of Bos indicus [J].
Canavez, Flavio C. ;
Luche, Douglas D. ;
Stothard, Paul ;
Leite, Katia R. M. ;
Sousa-Canavez, Juliana M. ;
Plastow, Graham ;
Meidanis, Joao ;
Souza, Maria Angelica ;
Feijao, Pedro ;
Moore, Steve S. ;
Camara-Lopes, Luiz H. .
JOURNAL OF HEREDITY, 2012, 103 (03) :342-348
[5]   De novo assembly of a haplotype-resolved human genome [J].
Cao, Hongzhi ;
Wu, Honglong ;
Luo, Ruibang ;
Huang, Shujia ;
Sun, Yuhui ;
Tong, Xin ;
Xie, Yinlong ;
Liu, Binghang ;
Yang, Hailong ;
Zheng, Hancheng ;
Li, Jian ;
Li, Bo ;
Wang, Yu ;
Yang, Fang ;
Sun, Peng ;
Liu, Siyang ;
Gao, Peng ;
Huang, Haodong ;
Sun, Jing ;
Chen, Dan ;
He, Guangzhu ;
Huang, Weihua ;
Huang, Zheng ;
Li, Yue ;
Tellier, Laurent C. A. M. ;
Liu, Xiao ;
Feng, Qiang ;
Xu, Xun ;
Zhang, Xiuqing ;
Bolund, Lars ;
Krogh, Anders ;
Kristiansen, Karsten ;
Drmanac, Radoje ;
Drmanac, Snezana ;
Nielsen, Rasmus ;
Li, Songgang ;
Wang, Jian ;
Yang, Huanming ;
Li, Yingrui ;
Wong, Gane Ka-Shu ;
Wang, Jun .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :617-+
[6]   The UCSC Genome Browser database: 2018 update [J].
Casper, Jonathan ;
Zweig, Ann S. ;
Villarreal, Chris ;
Tyner, Cath ;
Speir, Matthew L. ;
Rosenbloom, Kate R. ;
Raney, Brian J. ;
Lee, Christopher M. ;
Lee, Brian T. ;
Karolchik, Donna ;
Hinrichs, Angie S. ;
Haeussler, Maximilian ;
Guruvadoo, Luvina ;
Gonzalez, Jairo Navarro ;
Gibson, David ;
Fiddes, Ian T. ;
Eisenhart, Christopher ;
Diekhans, Mark ;
Clawson, Hiram ;
Barber, Galt P. ;
Armstrong, Joel ;
Haussler, David ;
Kuhn, Robert M. ;
Kent, W. James .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D762-D769
[7]   Resolving Multicopy Duplications de novo Using Polyploid Phasing [J].
Chaisson, Mark J. ;
Mukherjee, Sudipto ;
Kannan, Sreeram ;
Eichler, Evan E. .
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017, 2017, 10229 :117-133
[8]   Resolving the complexity of the human genome using single-molecule sequencing [J].
Chaisson, Mark J. P. ;
Huddleston, John ;
Dennis, Megan Y. ;
Sudmant, Peter H. ;
Malig, Maika ;
Hormozdiari, Fereydoun ;
Antonacci, Francesca ;
Surti, Urvashi ;
Sandstrom, Richard ;
Boitano, Matthew ;
Landolin, Jane M. ;
Stamatoyannopoulos, John A. ;
Hunkapiller, Michael W. ;
Korlach, Jonas ;
Eichler, Evan E. .
NATURE, 2015, 517 (7536) :608-U163
[9]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[10]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]