Indel variant analysis of short-read sequencing data with Scalpel

被引:88
作者
Fang, Han [1 ,3 ]
Bergmann, Ewa A. [4 ]
Arora, Kanika [4 ]
Vacic, Vladimir [4 ]
Zody, Michael C. [4 ]
Iossifov, Ivan [1 ]
O'Rawe, Jason A. [2 ,3 ]
Wu, Yiyang [2 ,3 ]
Barron, Laura T. Jimenez [2 ,5 ]
Rosenbaum, Julie [1 ]
Ronemus, Michael [1 ]
Lee, Yoon-ha [1 ]
Wang, Zihua [1 ]
Dikoglu, Esra [4 ]
Jobanputra, Vaidehi [4 ,6 ]
Lyon, Gholson J. [3 ]
Wigler, Michael [1 ]
Schatz, Michael C. [1 ,7 ]
Narzisi, Giuseppe [1 ,4 ]
机构
[1] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
[2] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, Cold Spring Harbor, NY 11724 USA
[3] SUNY Stony Brook, Stony Brook, NY 11794 USA
[4] New York Genome Ctr, New York, NY USA
[5] Univ Nacl Autonoma Mexico, Ctr Ciencias Genom, Cuernavaca, Morelos, Mexico
[6] Columbia Univ, Med Ctr, New York, NY USA
[7] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
DE-NOVO; VARIATION DISCOVERY; MUTATION-RATE; INSERTIONS; FRAMEWORK; GENOME; DELETIONS;
D O I
10.1038/nprot.2016.150
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in similar to 5 h after read mapping.
引用
收藏
页码:2529 / 2548
页数:20
相关论文
共 51 条
  • [31] The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes
    Montgomery, Stephen B.
    Goode, David L.
    Kvikstad, Erika
    Albers, Cornelis A.
    Zhang, Zhengdong D.
    Mu, Xinmeng Jasmine
    Ananda, Guruprasad
    Howie, Bryan
    Karczewski, Konrad J.
    Smith, Kevin S.
    Anaya, Vanessa
    Richardson, Rhea
    Davis, Joe
    MacArthur, Daniel G.
    Sidow, Arend
    Duret, Laurent
    Gerstein, Mark
    Makova, Kateryna D.
    Marchini, Jonathan
    McVean, Gil
    Lunter, Gerton
    [J]. GENOME RESEARCH, 2013, 23 (05) : 749 - 761
  • [32] ABRA: improved coding indel detection via assembly-based realignment
    Mose, Lisle E.
    Wilkerson, Matthew D.
    Hayes, D. Neil
    Perou, Charles M.
    Parker, Joel S.
    [J]. BIOINFORMATICS, 2014, 30 (19) : 2813 - 2815
  • [33] Small insertions and deletions (INDELs) in human genomes
    Mullaney, Julienne M.
    Mills, Ryan E.
    Pittard, W. Stephen
    Devine, Scott E.
    [J]. HUMAN MOLECULAR GENETICS, 2010, 19 : R131 - R136
  • [34] The challenge of small-scale repeats for indel discovery
    Narzisi, Giuseppe
    Schatz, Michael C.
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2015, 3
  • [35] Narzisi G, 2014, NAT METHODS, V11, P1033, DOI [10.1038/NMETH.3069, 10.1038/nmeth.3069]
  • [36] Mutational Processes Molding the Genomes of 21 Breast Cancers
    Nik-Zainal, Serena
    Alexandrov, Ludmil B.
    Wedge, David C.
    Van Loo, Peter
    Greenman, Christopher D.
    Raine, Keiran
    Jones, David
    Hinton, Jonathan
    Marshall, John
    Stebbings, Lucy A.
    Menzies, Andrew
    Martin, Sancha
    Leung, Kenric
    Chen, Lina
    Leroy, Catherine
    Ramakrishna, Manasa
    Rance, Richard
    Lau, King Wai
    Mudie, Laura J.
    Varela, Ignacio
    McBride, David J.
    Bignell, Graham R.
    Cooke, Susanna L.
    Shlien, Adam
    Gamble, John
    Whitmore, Ian
    Maddison, Mark
    Tarpey, Patrick S.
    Davies, Helen R.
    Papaemmanuil, Elli
    Stephens, Philip J.
    McLaren, Stuart
    Butler, Adam P.
    Teague, Jon W.
    Jonsson, Goran
    Garber, Judy E.
    Silver, Daniel
    Miron, Penelope
    Fatima, Aquila
    Boyault, Sandrine
    Langerod, Anita
    Tutt, Andrew
    Martens, John W. M.
    Aparicio, Samuel A. J. R.
    Borg, Ake
    Salomon, Anne Vincent
    Thomas, Gilles
    Borresen-Dale, Anne-Lise
    Richardson, Andrea L.
    Neuberger, Michael S.
    [J]. CELL, 2012, 149 (05) : 979 - 993
  • [37] A survey of tools for variant analysis of next-generation genome sequencing data
    Pabinger, Stephan
    Dander, Andreas
    Fischer, Maria
    Snajder, Rene
    Sperk, Michael
    Efremova, Mirjana
    Krabichler, Birgit
    Speicher, Michael R.
    Zschocke, Johannes
    Trajanoski, Zlatko
    [J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (02) : 256 - 278
  • [38] GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations
    Paila, Umadevi
    Chapman, Brad A.
    Kirchner, Rory
    Quinlan, Aaron R.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (07)
  • [39] BEDTools: a flexible suite of utilities for comparing genomic features
    Quinlan, Aaron R.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2010, 26 (06) : 841 - 842
  • [40] Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications
    Rimmer, Andy
    Phan, Hang
    Mathieson, Iain
    Iqbal, Zamin
    Twigg, Stephen R. F.
    Wilkie, Andrew O. M.
    McVean, Gil
    Lunter, Gerton
    [J]. NATURE GENETICS, 2014, 46 (08) : 912 - 918