Genome assembly comparison identifies structural variants in the human genome

被引:116
作者
Khaja, Razi
Zhang, Junjun
MacDonald, Jeffrey R.
He, Yongshu
Joseph-George, Ann M.
Wei, John
Rafiq, Muhammad A.
Qian, Cheng
Shago, Mary
Pantano, Lorena
Aburatani, Hiroyuki
Jones, Keith
Redon, Richard
Hurles, Matthew
Armengol, Lluis
Estivill, Xavier
Mural, Richard J.
Lee, Charles
Scherer, Stephen W.
Feuk, Lars
机构
[1] Hosp Sick Children, Program Genet & Genom Biol, Toronto, ON M5G 1L7, Canada
[2] Univ Toronto, Dept Mol & Med Genet, Toronto, ON M5G 1L7, Canada
[3] MaRS Ctr, Ctr Appl Genom, Toronto, ON M5G 1L7, Canada
[4] S Inst Informat Technol, Dept Biosci, Commiss Sci & Technol Sustainable Dev, Islamabad 44000, Pakistan
[5] Charles Darwin SN, Ctr Genom Regulat, Genes & Dis Program, Barcelona 08003, Catalonia, Spain
[6] Univ Tokyo, Adv Sci & Technol Res Ctr, Genome Sci Lab, Tokyo 1538904, Japan
[7] Affymetrix Inc, Santa Clara, CA 95051 USA
[8] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[9] Pompeu Fabra Univ, Barcelona, Spain
[10] Natl Genotyping Ctr, Barcelona, Spain
[11] Windber Res Inst, Windber, PA 15963 USA
[12] Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[13] Harvard Univ, Sch Med, Boston, MA 02115 USA
基金
英国惠康基金;
关键词
D O I
10.1038/ng1921
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs(1,2) and intermediate-sized variants (ISVs)(3). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (< 50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed < 1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.
引用
收藏
页码:1413 / 1418
页数:6
相关论文
共 30 条
  • [1] The independence of our genome assemblies
    Adams, MD
    Sutton, GG
    Smith, HO
    Myers, EW
    Venter, JC
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) : 3025 - 3026
  • [2] Recent segmental duplications in the human genome
    Bailey, JA
    Gu, ZP
    Clark, RA
    Reinert, K
    Samonte, RV
    Schwartz, S
    Adams, MD
    Myers, EW
    Li, PW
    Eichler, EE
    [J]. SCIENCE, 2002, 297 (5583) : 1003 - 1007
  • [3] Toward the $1000 human genome
    Bennett, ST
    Barnes, C
    Cox, A
    Davies, L
    Brown, C
    [J]. PHARMACOGENOMICS, 2005, 6 (04) : 373 - 382
  • [4] Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
    Cheung, J
    Estivill, X
    Khaja, R
    MacDonald, JR
    Lau, K
    Tsui, LC
    Scherer, SW
    [J]. GENOME BIOLOGY, 2003, 4 (04)
  • [5] Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies
    Feuk, L
    MacDonald, JR
    Tang, T
    Carson, AR
    Li, M
    Rao, G
    Khaja, R
    Scherer, SW
    [J]. PLOS GENETICS, 2005, 1 (04): : 489 - 498
  • [6] Structural variation in the human genome
    Feuk, L
    Carson, AR
    Scherer, SW
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (02) : 85 - 97
  • [7] The DNA sequence of human chromosome 7
    Hillier, LW
    Fulton, RS
    Fulton, LA
    Graves, TA
    Pepin, KH
    Wagner-McPherson, C
    Layman, D
    Maas, J
    Jaeger, S
    Walker, R
    Wylie, K
    Sekhon, M
    Becker, MC
    O'Laughlin, MD
    Schaller, ME
    Fewell, GA
    Delehaunty, KD
    Miner, TL
    Nash, WE
    Cordes, M
    Du, H
    Sun, H
    Edwards, J
    Bradshaw-Cordum, H
    Ali, J
    Andrews, S
    Isak, A
    VanBrunt, A
    Nguyen, C
    Du, FY
    Lamar, B
    Courtney, L
    Kalicki, J
    Ozersky, P
    Bielicki, L
    Scott, K
    Holmes, A
    Harkins, R
    Harris, A
    Strong, CM
    Hou, SF
    Tomlinson, C
    Dauphin-Kohlberg, S
    Kozlowicz-Reilly, A
    Leonard, S
    Rohlfing, T
    Rock, SM
    Tin-Wollam, AM
    Abbott, A
    Minx, P
    [J]. NATURE, 2003, 424 (6945) : 157 - U2
  • [8] Detection of large-scale variation in the human genome
    Iafrate, AJ
    Feuk, L
    Rivera, MN
    Listewnik, ML
    Donahoe, PK
    Qi, Y
    Scherer, SW
    Lee, C
    [J]. NATURE GENETICS, 2004, 36 (09) : 949 - 951
  • [9] Whole-genome shotgun assembly and comparison of human genome assemblies
    Istrail, S
    Sutton, GG
    Florea, L
    Halpern, AL
    Mobarry, CM
    Lippert, R
    Walenz, B
    Shatkay, H
    Dew, I
    Miller, JR
    Flanigan, MJ
    Edwards, NJ
    Bolanos, R
    Fasulo, D
    Halldorsson, BV
    Hannenhalli, S
    Turner, R
    Yooseph, S
    Lu, F
    Nusskern, DR
    Shue, BC
    Zheng, XQH
    Zhong, F
    Delcher, AL
    Huson, DH
    Kravitz, SA
    Mouchard, L
    Reinert, K
    Remington, KA
    Clark, AG
    Waterman, MS
    Eichler, EE
    Adams, MD
    Hunkapiller, MW
    Myers, EW
    Venter, JC
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (07) : 1916 - 1921
  • [10] The human genome browser at UCSC
    Kent, WJ
    Sugnet, CW
    Furey, TS
    Roskin, KM
    Pringle, TH
    Zahler, AM
    Haussler, D
    [J]. GENOME RESEARCH, 2002, 12 (06) : 996 - 1006