Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

被引:66
作者
Du, Huilong [1 ,2 ]
Liang, Chengzhi [1 ,2 ]
机构
[1] Chinese Acad Sci, Innovat Acad Seed Design, Inst Genet & Dev Biol, State Key Lab Plant Genom, 1 Beichen West Rd 2, Beijing 100101, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
GENOME ASSEMBLIES; NOVO;
D O I
10.1038/s41467-019-13355-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The abundant repetitive sequences in complex eukaryotic genomes cause fragmented assemblies, which lose value as reference genomes, often due to incomplete gene sequences and unanchored or mispositioned contigs on chromosomes. Here we report a genome assembly method HERA, which resolves repeats efficiently by constructing a connection graph from an overlap graph. We test HERA on the genomes of rice, maize, human, and Tartary buckwheat with single-molecule sequencing and mapping data. HERA correctly assembles most of the previously unassembled regions, resulting in dramatically improved, highly contiguous genome assemblies with newly assembled gene sequences. For example, the maize contig N50 size reaches 61.2 Mb and the Tartary buckwheat genome comprises only 20 contigs. HERA can also be used to fill gaps and fix errors in reference genomes. The application of HERA will greatly improve the quality of new or existing assemblies of complex genomes.
引用
收藏
页数:10
相关论文
共 31 条
  • [21] Myers E W, 1995, J Comput Biol, V2, P275, DOI 10.1089/cmb.1995.2.275
  • [22] The fragment assembly string graph
    Myers, EW
    [J]. BIOINFORMATICS, 2005, 21 : 79 - 85
  • [23] A whole-genome assembly of Drosophila
    Myers, EW
    Sutton, GG
    Delcher, AL
    Dew, IM
    Fasulo, DP
    Flanigan, MJ
    Kravitz, SA
    Mobarry, CM
    Reinert, KHJ
    Remington, KA
    Anson, EL
    Bolanos, RA
    Chou, HH
    Jordan, CM
    Halpern, AL
    Lonardi, S
    Beasley, EM
    Brandon, RC
    Chen, L
    Dunn, PJ
    Lai, ZW
    Liang, Y
    Nusskern, DR
    Zhan, M
    Zhang, Q
    Zheng, XQ
    Rubin, GM
    Adams, MD
    Venter, JC
    [J]. SCIENCE, 2000, 287 (5461) : 2196 - 2204
  • [24] Sequence assembly demystified
    Nagarajan, Niranjan
    Pop, Mihai
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (03) : 157 - 167
  • [25] Long-read sequencing and de novo assembly of a Chinese genome
    Shi, Lingling
    Guo, Yunfei
    Dong, Chengliang
    Huddleston, John
    Yang, Hui
    Han, Xiaolu
    Fu, Aisi
    Li, Quan
    Li, Na
    Gong, Siyi
    Lintner, Katherine E.
    Ding, Qiong
    Wang, Zou
    Hu, Jiang
    Wang, Depeng
    Wang, Feng
    Wang, Lin
    Lyon, Gholson J.
    Guan, Yongtao
    Shen, Yufeng
    Evgrafov, Oleg V.
    Knowles, James A.
    Thibaud-Nissen, Francoise
    Schneider, Valerie
    Yu, Chack-Yung
    Zhou, Libing
    Eichler, Evan E.
    So, Kwok-Fai
    Wang, Kai
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [26] Global diversity, population stratification, and selection of human copy-number variation
    Sudmant, Peter H.
    Mallick, Swapan
    Nelson, Bradley J.
    Hormozdiari, Fereydoun
    Krumm, Niklas
    Huddleston, John
    Coe, Bradley P.
    Baker, Carl
    Nordenfelt, Susanne
    Bamshad, Michael
    Jorde, Lynn B.
    Posukh, Olga L.
    Sahakyan, Hovhannes
    Watkins, W. Scott
    Yepiskoposyan, Levon
    Abdullah, M. Syafiq
    Bravi, Claudio M.
    Capelli, Cristian
    Hervig, Tor
    Wee, Joseph T. S.
    Tyler-Smith, Chris
    van Driem, George
    Romero, Irene Gallego
    Jha, Aashish R.
    Karachanak-Yankova, Sena
    Toncheva, Draga
    Comas, David
    Henn, Brenna
    Kivisild, Toomas
    Ruiz-Linares, Andres
    Sajantila, Antti
    Metspalu, Ene
    Parik, Jueri
    Villems, Richard
    Starikovskaya, Elena B.
    Ayodo, George
    Beall, Cynthia M.
    Di Rienzo, Anna
    Hammer, Michael F.
    Khusainova, Rita
    Khusnutdinova, Elza
    Klitz, William
    Winkler, Cheryl
    Labuda, Damian
    Metspalu, Mait
    Tishkoff, Sarah A.
    Dryomov, Stanislav
    Sukernik, Rem
    Patterson, Nick
    Reich, David
    [J]. SCIENCE, 2015, 349 (6253)
  • [27] Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes
    Sun, Silong
    Zhou, Yingsi
    Chen, Jian
    Shi, Junpeng
    Zhao, Haiming
    Zhao, Hainan
    Song, Weibin
    Zhang, Mei
    Cui, Yang
    Dong, Xiaomei
    Liu, Han
    Ma, Xuxu
    Jiao, Yinping
    Wang, Bo
    Wei, Xuehong
    Stein, Joshua C.
    Glaubitz, Jeff C.
    Lu, Fei
    Yu, Guoliang
    Liang, Chengzhi
    Fengler, Kevin
    Li, Bailin
    Rafalski, Antoni
    Schnable, Patrick S.
    Ware, Doreen H.
    Buckler, Edward S.
    Lai, Jinsheng
    [J]. NATURE GENETICS, 2018, 50 (09) : 1289 - +
  • [28] Long-read sequence and assembly of segmental duplications
    Vollger, Mitchell R.
    Dishuck, Philip C.
    Sorensen, Melanie
    Welch, AnneMarie E.
    Dang, Vy
    Dougherty, Max L.
    Graves-Lindsay, Tina A.
    Wilson, Richard K.
    Chaisson, Mark J. P.
    Eichler, Evan E.
    [J]. NATURE METHODS, 2019, 16 (01) : 88 - +
  • [29] Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
    Wenger, Aaron M.
    Peluso, Paul
    Rowell, William J.
    Chang, Pi-Chuan
    Hall, Richard J.
    Concepcion, Gregory T.
    Ebler, Jana
    Fungtammasan, Arkarachai
    Kolesnikov, Alexey
    Olson, Nathan D.
    Topfer, Armin
    Alonge, Michael
    Mahmoud, Medhat
    Qian, Yufeng
    Chin, Chen-Shan
    Phillippy, Adam M.
    Schate, Michael C.
    Myers, Gene
    DePristo, Mark A.
    Ruan, Jue
    Marschall, Tobias
    Sedlazeck, Fritz J.
    Zook, Justin M.
    Li, Heng
    Koren, Sergey
    Carroll, Andrew
    Rank, David R.
    Hunkapiller, Michael W.
    [J]. NATURE BIOTECHNOLOGY, 2019, 37 (10) : 1155 - +
  • [30] Xiao CL, 2017, NAT METHODS, V14, P1072, DOI [10.1038/NMETH.4432, 10.1038/nmeth.4432]