Limitations of next-generation genome sequence assembly

被引:498
作者
Alkan, Can
Sajjadian, Saba
Eichler, Evan E. [1 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
SEGMENTAL DUPLICATIONS; COPY-NUMBER; ELEMENTS;
D O I
10.1038/NMETH.1527
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.
引用
收藏
页码:61 / 65
页数:5
相关论文
共 27 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   Segmental duplications: Organization and impact within the current Human Genome Project assembly [J].
Bailey, JA ;
Yavor, AM ;
Massa, HF ;
Trask, BJ ;
Eichler, EE .
GENOME RESEARCH, 2001, 11 (06) :1005-1017
[3]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[4]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[5]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945
[6]   A 360-kb interchromosomal duplication of the human HYDIN locus [J].
Doggett, Norman A. ;
Xie, Gary ;
Meincke, Linda J. ;
Sutherland, Robert D. ;
Mundt, Mark O. ;
Berbari, Nicolas S. ;
Davy, Brian E. ;
Robinson, Michael L. ;
Rudd, M. Katharine ;
Weber, James L. ;
Stallings, Raymond L. ;
Han, Cliff .
GENOMICS, 2006, 88 (06) :762-771
[7]   Whole-genome disassembly [J].
Green, P .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (07) :4143-4144
[8]   Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species [J].
Haussler, David ;
O'Brien, Stephen J. ;
Ryder, Oliver A. ;
Barker, F. Keith ;
Clamp, Michele ;
Crawford, Andrew J. ;
Hanner, Robert ;
Hanotte, Olivier ;
Johnson, Warren E. ;
McGuire, Jimmy A. ;
Miller, Webb ;
Murphy, Robert W. ;
Murphy, William J. ;
Sheldon, Frederick H. ;
Sinervo, Barry ;
Venkatesh, Byrappa ;
Wiley, Edward O. ;
Allendorf, Fred W. ;
Amato, George ;
Baker, C. Scott ;
Bauer, Aaron ;
Beja-Pereira, Albano ;
Bermingham, Eldredge ;
Bernardi, Giacomo ;
Bonvicino, Cibele R. ;
Brenner, Sydney ;
Burke, Terry ;
Cracraft, Joel ;
Diekhans, Mark ;
Edwards, Scott ;
Ericson, Per G. P. ;
Estes, James ;
Fjelsda, Jon ;
Flesness, Nate ;
Gamble, Tony ;
Gaubert, Philippe ;
Graphodatsky, Alexander S. ;
Graves, Jennifer A. Marshall ;
Green, Eric D. ;
Green, Richard E. ;
Hackett, Shannon ;
Hebert, Paul ;
Helgen, Kristofer M. ;
Joseph, Leo ;
Kessing, Bailey ;
Kingsley, David M. ;
Lewin, Harris A. ;
Luikart, Gordon ;
Martelli, Paolo ;
Moreira, Miguel A. M. .
JOURNAL OF HEREDITY, 2009, 100 (06) :659-674
[9]   The genome of the cucumber, Cucumis sativus L. [J].
Huang, Sanwen ;
Li, Ruiqiang ;
Zhang, Zhonghua ;
Li, Li ;
Gu, Xingfang ;
Fan, Wei ;
Lucas, William J. ;
Wang, Xiaowu ;
Xie, Bingyan ;
Ni, Peixiang ;
Ren, Yuanyuan ;
Zhu, Hongmei ;
Li, Jun ;
Lin, Kui ;
Jin, Weiwei ;
Fei, Zhangjun ;
Li, Guangcun ;
Staub, Jack ;
Kilian, Andrzej ;
van der Vossen, Edwin A. G. ;
Wu, Yang ;
Guo, Jie ;
He, Jun ;
Jia, Zhiqi ;
Ren, Yi ;
Tian, Geng ;
Lu, Yao ;
Ruan, Jue ;
Qian, Wubin ;
Wang, Mingwei ;
Huang, Quanfei ;
Li, Bo ;
Xuan, Zhaoling ;
Cao, Jianjun ;
Asan ;
Wu, Zhigang ;
Zhang, Juanbin ;
Cai, Qingle ;
Bai, Yinqi ;
Zhao, Bowen ;
Han, Yonghua ;
Li, Ying ;
Li, Xuefeng ;
Wang, Shenhao ;
Shi, Qiuxiang ;
Liu, Shiqiang ;
Cho, Won Kyong ;
Kim, Jae-Yean ;
Xu, Yong ;
Heller-Uszynska, Katarzyna .
NATURE GENETICS, 2009, 41 (12) :1275-U29
[10]   Repbase update, a database of eukaryotic repetitive elements [J].
Jurka, J ;
Kapitonov, VV ;
Pavlicek, A ;
Klonowski, P ;
Kohany, O ;
Walichiewicz, J .
CYTOGENETIC AND GENOME RESEARCH, 2005, 110 (1-4) :462-467