Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

被引:66
作者
Du, Huilong [1 ,2 ]
Liang, Chengzhi [1 ,2 ]
机构
[1] Chinese Acad Sci, Innovat Acad Seed Design, Inst Genet & Dev Biol, State Key Lab Plant Genom, 1 Beichen West Rd 2, Beijing 100101, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
GENOME ASSEMBLIES; NOVO;
D O I
10.1038/s41467-019-13355-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The abundant repetitive sequences in complex eukaryotic genomes cause fragmented assemblies, which lose value as reference genomes, often due to incomplete gene sequences and unanchored or mispositioned contigs on chromosomes. Here we report a genome assembly method HERA, which resolves repeats efficiently by constructing a connection graph from an overlap graph. We test HERA on the genomes of rice, maize, human, and Tartary buckwheat with single-molecule sequencing and mapping data. HERA correctly assembles most of the previously unassembled regions, resulting in dramatically improved, highly contiguous genome assemblies with newly assembled gene sequences. For example, the maize contig N50 size reaches 61.2 Mb and the Tartary buckwheat genome comprises only 20 contigs. HERA can also be used to fill gaps and fix errors in reference genomes. The application of HERA will greatly improve the quality of new or existing assemblies of complex genomes.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] Insights into corn genes derived from large-scale cDNA sequencing
    Alexandrov, Nickolai N.
    Brover, Vyacheslav V.
    Freidin, Stanislav
    Troukhan, Maxim E.
    Tatarinova, Tatiana V.
    Zhang, Hongyu
    Swaller, Timothy J.
    Lu, Yu-Ping
    Bouck, John
    Flavell, Richard B.
    Feldmann, Kenneth A.
    [J]. PLANT MOLECULAR BIOLOGY, 2009, 69 (1-2) : 179 - 194
  • [2] Recent segmental duplications in the human genome
    Bailey, JA
    Gu, ZP
    Clark, RA
    Reinert, K
    Samonte, RV
    Schwartz, S
    Adams, MD
    Myers, EW
    Li, PW
    Eichler, EE
    [J]. SCIENCE, 2002, 297 (5583) : 1003 - 1007
  • [3] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [4] Boza V, 2014, LECT N BIOINFORMAT, V8701, P122, DOI 10.1007/978-3-662-44753-6_10
  • [5] Telescoper: de novo assembly of highly repetitive regions
    Bresler, Ma'ayan
    Sheehan, Sara
    Chan, Andrew H.
    Song, Yun S.
    [J]. BIOINFORMATICS, 2012, 28 (18) : I311 - I317
  • [6] Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions
    Burton, Joshua N.
    Adey, Andrew
    Patwardhan, Rupali P.
    Qiu, Ruolan
    Kitzman, Jacob O.
    Shendure, Jay
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (12) : 1119 - +
  • [7] Chaisson M. J., 2017, LNCS, P117
  • [8] Multi-platform discovery of haplotype-resolved structural variation in human genomes
    Chaisson, Mark J. P.
    Sanders, Ashley D.
    Zhao, Xuefang
    Malhotra, Ankit
    Porubsky, David
    Rausch, Tobias
    Gardner, Eugene J.
    Rodriguez, Oscar L.
    Guo, Li
    Collins, Ryan L.
    Fan, Xian
    Wen, Jia
    Handsaker, Robert E.
    Fairley, Susan
    Kronenberg, Zev N.
    Kong, Xiangmeng
    Hormozdiari, Fereydoun
    Lee, Dillon
    Wenger, Aaron M.
    Hastie, Alex R.
    Antaki, Danny
    Anantharaman, Thomas
    Audano, Peter A.
    Brand, Harrison
    Cantsilieris, Stuart
    Cao, Han
    Cerveira, Eliza
    Chen, Chong
    Chen, Xintong
    Chin, Chen-Shan
    Chong, Zechen
    Chuang, Nelson T.
    Lambert, Christine C.
    Church, Deanna M.
    Clarke, Laura
    Farrell, Andrew
    Flores, Joey
    Galeev, Timur
    Gorkin, David U.
    Gujral, Madhusudan
    Guryev, Victor
    Heaton, William Haynes
    Korlach, Jonas
    Kumar, Sushant
    Kwon, Jee Young
    Lam, Ernest T.
    Lee, Jong Eun
    Lee, Joyce
    Lee, Wan-Ping
    Lee, Sau Peng
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [9] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes
    Chaisson, Mark J. P.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2015, 16 (11) : 627 - 640
  • [10] Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/nmeth.4035, 10.1038/NMETH.4035]