Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie

被引:59
作者
Giannoulatou, Eleni [1 ,2 ]
Park, Shin-Ho [1 ,2 ]
Humphreys, David T. [1 ,2 ]
Ho, Joshua W. K. [1 ,2 ]
机构
[1] Victor Chang Cardiac Res Inst, Darlinghurst, NSW, Australia
[2] Univ New S Wales, Sydney, NSW 2052, Australia
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
READ ALIGNMENT; EXOME; ALGORITHMS;
D O I
10.1186/1471-2105-15-S16-S15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the software testing literature, namely Metamorphic Testing (MT), to systematically test three widely used short-read sequence alignment programs. Results: MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations. We tested BWA, Bowtie and Bowtie2 using simulated data and one HapMap dataset. It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results. Furthermore, we found that the list of variant calls can be affected unless strict quality control is applied during variant calling. Conclusion: Thorough testing of bioinformatics software is important in delivering clinical genomic medicine. This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.
引用
收藏
页数:8
相关论文
共 41 条
  • [1] The anatomy of successful computational biology software
    Altschul, Stephen
    Demchak, Barry
    Durbin, Richard
    Gentleman, Robert
    Krzywinski, Martin
    Li, Heng
    Nekrutenko, Anton
    Robinson, James
    Rasband, Wayne
    Taylor, James
    Trapnell, Cole
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (10) : 894 - 897
  • [2] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [3] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [4] Ammann P., 2016, INTRO SOFTWARE TESTI
  • [5] [Anonymous], 1990, Software Testing Techniques
  • [6] Exome sequencing as a tool for Mendelian disease gene discovery
    Bamshad, Michael J.
    Ng, Sarah B.
    Bigham, Abigail W.
    Tabor, Holly K.
    Emond, Mary J.
    Nickerson, Deborah A.
    Shendure, Jay
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (11) : 745 - 755
  • [7] RETRACTED: Evaluation of next-generation sequencing software in mapping and assembly (Retracted article. See vol. 56, pg. 687, 2011)
    Bao, Suying
    Jiang, Rui
    Kwan, WingKeung
    Wang, BinBin
    Ma, Xu
    Song, You-Qiang
    [J]. JOURNAL OF HUMAN GENETICS, 2011, 56 (06) : 406 - 414
  • [8] Scientific software development is not an oxymoron
    Baxter, Susan M.
    Day, Steven W.
    Fetrow, Jacquelyn S.
    Reisinger, Stephanie J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (09) : 975 - 978
  • [9] Computational solutions for omics data
    Berger, Bonnie
    Peng, Jian
    Singh, Mona
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (05) : 333 - 346
  • [10] Comparing simulation results of SBML capable simulators
    Bergmann, Frank T.
    Sauro, Herbert M.
    [J]. BIOINFORMATICS, 2008, 24 (17) : 1963 - 1965