WEP: a high-performance analysis pipeline for whole-exome data

被引:34
作者
D'Antonio, Mattia [1 ,2 ]
De Meo, Paolo D'Onorio [2 ,5 ,6 ]
Paoletti, Daniele [2 ]
Elmi, Berardino [1 ,2 ]
Pallocca, Matteo [2 ]
Sanna, Nico [5 ]
Picardi, Ernesto [1 ]
Pesole, Graziano [1 ,3 ,4 ]
Castrignano, Tiziana [2 ,5 ]
机构
[1] Univ Studi Bari, Dipartimento Biosci Biotecnol & Sci Farmacol, Bari, Italy
[2] Consorzio Interuniv Applicaz Supercalcolo Univ Ri, CASPUR, Rome, Italy
[3] CNR, Ist Biomembrane Bioenerget, Bari, Italy
[4] Ctr Excellence Genom CEGBA, Bari, Italy
[5] Consorzio Interuniv Supercalcolo, Bologna, Italy
[6] Sapienza Univ Roma, Dipartimento Biotecnol & Ematol, Rome, Italy
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
GENERATION SEQUENCING DATA; DISEASE-GENE DISCOVERY; MUTATIONS; VARIANTS; TOOL; TECHNOLOGIES; STRATEGIES; FRAMEWORK; ALIGNMENT; GENOMICS;
D O I
10.1186/1471-2105-14-S7-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline. Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. Results: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps: 1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. Conclusions: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization. Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives. The web tool is available at the following web address: http://www.caspur.it/wep
引用
收藏
页数:11
相关论文
共 58 条
  • [1] A method and server for predicting damaging missense mutations
    Adzhubei, Ivan A.
    Schmidt, Steffen
    Peshkin, Leonid
    Ramensky, Vasily E.
    Gerasimova, Anna
    Bork, Peer
    Kondrashov, Alexey S.
    Sunyaev, Shamil R.
    [J]. NATURE METHODS, 2010, 7 (04) : 248 - 249
  • [2] Galaxy CloudMan: delivering cloud compute clusters
    Afgan, Enis
    Baker, Dannon
    Coraor, Nate
    Chapman, Brad
    Nekrutenko, Anton
    Taylor, James
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [3] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [4] CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing
    Angiuoli, Samuel V.
    Matalka, Malcolm
    Gussman, Aaron
    Galens, Kevin
    Vangala, Mahesh
    Riley, David R.
    Arze, Cesar
    White, James R.
    White, Owen
    Fricke, W. Florian
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [5] [Anonymous], 1994, TECHNICAL REPORT
  • [6] TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data
    Asmann, Yan W.
    Middha, Sumit
    Hossain, Asif
    Baheti, Saurabh
    Li, Ying
    Chai, High-Seng
    Sun, Zhifu
    Duffy, Patrick H.
    Hadad, Ahmed A.
    Nair, Asha
    Liu, Xiaoyu
    Zhang, Yuji
    Klee, Eric W.
    Kalari, Krishna R.
    Kocher, Jean-Pierre A.
    [J]. BIOINFORMATICS, 2012, 28 (02) : 277 - 278
  • [7] Next-generation sequencing: adjusting to data overload
    Baker, Monya
    [J]. NATURE METHODS, 2010, 7 (07) : 495 - 499
  • [8] Exome sequencing as a tool for Mendelian disease gene discovery
    Bamshad, Michael J.
    Ng, Sarah B.
    Bigham, Abigail W.
    Tabor, Holly K.
    Emond, Mary J.
    Nickerson, Deborah A.
    Shendure, Jay
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (11) : 745 - 755
  • [9] Solexa Ltd
    Bennett, S
    [J]. PHARMACOGENOMICS, 2004, 5 (04) : 433 - 438
  • [10] ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence
    Blanca, Jose M.
    Pascual, Laura
    Ziarsolo, Peio
    Nuez, Fernando
    Canizares, Joaquin
    [J]. BMC GENOMICS, 2011, 12