Read clouds uncover variation in complex regions of the human genome

被引:44
作者
Bishara, Alex [1 ]
Liu, Yuling [1 ,2 ]
Weng, Ziming [3 ]
Kashef-Haghighi, Dorna [1 ]
Newburger, Daniel E. [4 ]
West, Robert [3 ]
Sidow, Arend [3 ,5 ]
Batzoglou, Serafim [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[3] Stanford Univ, Sch Med, Dept Pathol, Stanford, CA 94305 USA
[4] Biomed Informat Training Program, Stanford, CA 94305 USA
[5] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; SEGMENTAL DUPLICATIONS; VARIATION DISCOVERY; ACCURATE; IMPACT;
D O I
10.1101/gr.191189.115
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.
引用
收藏
页码:1570 / 1580
页数:11
相关论文
共 41 条
  • [1] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [2] Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing
    Amini, Sasan
    Pushkarev, Dmitry
    Christiansen, Lena
    Kostem, Emrah
    Royce, Tom
    Turk, Casey
    Pignatelli, Natasha
    Adey, Andrew
    Kitzman, Jacob O.
    Vijayan, Kandaswamy
    Ronaghi, Mostafa
    Shendure, Jay
    Gunderson, Kevin L.
    Steemers, Frank J.
    [J]. NATURE GENETICS, 2014, 46 (12) : 1343 - 1349
  • [3] Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability
    Antonacci, Francesca
    Dennis, Megan Y.
    Huddleston, John
    Sudmant, Peter H.
    Steinberg, Karyn Meltz
    Rosenfeld, Jill A.
    Miroballo, Mattia
    Graves, Tina A.
    Vives, Laura
    Malig, Maika
    Denman, Laura
    Raja, Archana
    Stuart, Andrew
    Tang, Joyce
    Munson, Brenton
    Shaffer, Lisa G.
    Amemiya, Chris T.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2014, 46 (12) : 1293 - 1302
  • [4] MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island
    Ashton, Philip M.
    Nair, Satheesh
    Dallman, Tim
    Rubino, Salvatore
    Rabsch, Wolfgang
    Mwaigwisya, Solomon
    Wain, John
    O'Grady, Justin
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (03) : 296 - +
  • [5] Recent segmental duplications in the human genome
    Bailey, JA
    Gu, ZP
    Clark, RA
    Reinert, K
    Samonte, RV
    Schwartz, S
    Adams, MD
    Myers, EW
    Li, PW
    Eichler, EE
    [J]. SCIENCE, 2002, 297 (5583) : 1003 - 1007
  • [6] Segmental duplications: Organization and impact within the current Human Genome Project assembly
    Bailey, JA
    Yavor, AM
    Massa, HF
    Trask, BJ
    Eichler, EE
    [J]. GENOME RESEARCH, 2001, 11 (06) : 1005 - 1017
  • [7] Sequencing a genome by walking with clone-end sequences: A mathematical analysis
    Batzoglou, S
    Berger, B
    Mesirov, J
    Lander, ES
    [J]. GENOME RESEARCH, 1999, 9 (12) : 1163 - 1174
  • [8] Resolving the complexity of the human genome using single-molecule sequencing
    Chaisson, Mark J. P.
    Huddleston, John
    Dennis, Megan Y.
    Sudmant, Peter H.
    Malig, Maika
    Hormozdiari, Fereydoun
    Antonacci, Francesca
    Surti, Urvashi
    Sandstrom, Richard
    Boitano, Matthew
    Landolin, Jane M.
    Stamatoyannopoulos, John A.
    Hunkapiller, Michael W.
    Korlach, Jonas
    Eichler, Evan E.
    [J]. NATURE, 2015, 517 (7536) : 608 - U163
  • [9] Modernizing Reference Genome Assemblies
    Church, Deanna M.
    Schneider, Valerie A.
    Graves, Tina
    Auger, Katherine
    Cunningham, Fiona
    Bouk, Nathan
    Chen, Hsiu-Chuan
    Agarwala, Richa
    McLaren, William M.
    Ritchie, Graham R. S.
    Albracht, Derek
    Kremitzki, Milinn
    Rock, Susan
    Kotkiewicz, Holland
    Kremitzki, Colin
    Wollam, Aye
    Trani, Lee
    Fulton, Lucinda
    Fulton, Robert
    Matthews, Lucy
    Whitehead, Siobhan
    Chow, Will
    Torrance, James
    Dunn, Matthew
    Harden, Glenn
    Threadgold, Glen
    Wood, Jonathan
    Collins, Joanna
    Heath, Paul
    Griffiths, Guy
    Pelan, Sarah
    Grafham, Darren
    Eichler, Evan E.
    Weinstock, George
    Mardis, Elaine R.
    Wilson, Richard K.
    Howe, Kerstin
    Flicek, Paul
    Hubbard, Tim
    [J]. PLOS BIOLOGY, 2011, 9 (07)
  • [10] Refining analyses of copy number variation identifies specific genes associated with developmental delay
    Coe, Bradley P.
    Witherspoon, Kali
    Rosenfeld, Jill A.
    van Bon, Bregje W. M.
    Vulto-van Silfhout, Anneke T.
    Bosco, Paolo
    Friend, Kathryn L.
    Baker, Carl
    Buono, Serafino
    Vissers, Lisenka E. L. M.
    Schuurs-Hoeijmakers, Janneke H.
    Hoischen, Alex
    Pfundt, Rolph
    Krumm, Nik
    Carvill, Gemma L.
    Li, Deana
    Amaral, David
    Brown, Natasha
    Lockhart, Paul J.
    Scheffer, Ingrid E.
    Alberti, Antonino
    Shaw, Marie
    Pettinato, Rosa
    Tervo, Raymond
    de Leeuw, Nicole
    Reijnders, Margot R. F.
    Torchia, Beth S.
    Peeters, Hilde
    O'Roak, Brian J.
    Fichera, Marco
    Hehir-Kwa, Jayne Y.
    Shendure, Jay
    Mefford, Heather C.
    Haan, Eric
    Gecz, Jozef
    de Vries, Bert B. A.
    Romano, Corrado
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2014, 46 (10) : 1063 - 1071