Indel variant analysis of short-read sequencing data with Scalpel

被引:88
作者
Fang, Han [1 ,3 ]
Bergmann, Ewa A. [4 ]
Arora, Kanika [4 ]
Vacic, Vladimir [4 ]
Zody, Michael C. [4 ]
Iossifov, Ivan [1 ]
O'Rawe, Jason A. [2 ,3 ]
Wu, Yiyang [2 ,3 ]
Barron, Laura T. Jimenez [2 ,5 ]
Rosenbaum, Julie [1 ]
Ronemus, Michael [1 ]
Lee, Yoon-ha [1 ]
Wang, Zihua [1 ]
Dikoglu, Esra [4 ]
Jobanputra, Vaidehi [4 ,6 ]
Lyon, Gholson J. [3 ]
Wigler, Michael [1 ]
Schatz, Michael C. [1 ,7 ]
Narzisi, Giuseppe [1 ,4 ]
机构
[1] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
[2] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, Cold Spring Harbor, NY 11724 USA
[3] SUNY Stony Brook, Stony Brook, NY 11794 USA
[4] New York Genome Ctr, New York, NY USA
[5] Univ Nacl Autonoma Mexico, Ctr Ciencias Genom, Cuernavaca, Morelos, Mexico
[6] Columbia Univ, Med Ctr, New York, NY USA
[7] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
DE-NOVO; VARIATION DISCOVERY; MUTATION-RATE; INSERTIONS; FRAMEWORK; GENOME; DELETIONS;
D O I
10.1038/nprot.2016.150
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in similar to 5 h after read mapping.
引用
收藏
页码:2529 / 2548
页数:20
相关论文
共 51 条
  • [1] Dindel: Accurate indel calls from short-read data
    Albers, Cornelis A.
    Lunter, Gerton
    MacArthur, Daniel G.
    McVean, Gilean
    Ouwehand, Willem H.
    Durbin, Richard
    [J]. GENOME RESEARCH, 2011, 21 (06) : 961 - 973
  • [2] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [3] [Anonymous], ALIGNING SEQUENCE RE, DOI DOI 10.48550/ARXIV.1303.3997
  • [4] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [5] Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions
    Brannon, A. Rose
    Vakiani, Efsevia
    Sylvester, Brooke E.
    Scott, Sasinya N.
    McDermott, Gregory
    Shah, Ronak H.
    Kania, Krishan
    Viale, Agnes
    Oschwald, Dayna M.
    Vacic, Vladimir
    Emde, Anne-Katrin
    Cercek, Andrea
    Yaeger, Rona
    Kemeny, Nancy E.
    Saltz, Leonard B.
    Shia, Jinru
    D'Angelica, Michael I.
    Weiser, Martin R.
    Solit, David B.
    Berger, Michael F.
    [J]. GENOME BIOLOGY, 2014, 15 (08):
  • [6] TIGRA: A targeted iterative graph routing assembler for breakpoint assembly
    Chen, Ken
    Chen, Lei
    Fan, Xian
    Wallis, John
    Ding, Li
    Weinstock, George
    [J]. GENOME RESEARCH, 2014, 24 (02) : 310 - 317
  • [7] Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications
    Chen, Xiaoyu
    Schulz-Trieglaff, Ole
    Shaw, Richard
    Barnes, Bret
    Schlesinger, Felix
    Kallberg, Morten
    Cox, Anthony J.
    Kruglyakl, Semyon
    Saunders, Christopher T.
    [J]. BIOINFORMATICS, 2016, 32 (08) : 1220 - 1222
  • [8] Cingolani P., PROGRAM ANNOTATING P
  • [9] A New Initiative on Precision Medicine
    Collins, Francis S.
    Varmus, Harold
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (09) : 793 - 795
  • [10] High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome
    Denver, DR
    Morris, K
    Lynch, M
    Thomas, WK
    [J]. NATURE, 2004, 430 (7000) : 679 - 682