The State of Software for Evolutionary Biology

被引:27
作者
Darriba, Diego [1 ]
Flouri, Tomas [1 ]
Stamatakis, Alexandros [1 ,2 ]
机构
[1] Heidelberg Inst Theoret Studies, Sci Comp Grp, Heidelberg, Germany
[2] Karlsruhe Inst Technol, Inst Theoret Informat, Karlsruhe, Germany
关键词
software engineering quality; scientific computing; data analysis; numerical analysis; policy issues; evolutionary inference software; MAXIMUM-LIKELIHOOD; PHYLOGENETIC ANALYSIS; TOOL; PERFORMANCE; SEQUENCES; ALIGNMENT; MODEL;
D O I
10.1093/molbev/msy014
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With Next Generation Sequencing data being routinely used, evolutionary biology is transforming into a computational science. Thus, researchers have to rely on a growing number of increasingly complex software. All widely used core tools in the field have grown considerably, in terms of the number of features as well as lines of code and consequently, also with respect to software complexity. A topic that has received little attention is the software engineering quality of widely used core analysis tools. Software developers appear to rarely assess the quality of their code, and this can have potential negative consequences for end-users. To this end, we assessed the code quality of 16 highly cited and compute-intensive tools mainly written in C/C++ (e.g., MrBayes, MAFFT, SweepFinder, etc.) and JAVA (BEAST) from the broader area of evolutionary biology that are being routinely used in current data analysis pipelines. Because, the software engineering quality of the tools we analyzed is rather unsatisfying, we provide a list of best practices for improving the quality of existing tools and list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal as well as science policy and, more importantly, funding issues that need to be addressed for improving software engineering quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software engineering quality issues and to emphasize the substantial lack of funding for scientific software development.
引用
收藏
页码:1037 / 1046
页数:10
相关论文
共 51 条
[1]  
Abdelmalek N. N., 1971, BIT (Nordisk Tidskrift for Informationsbehandling), V11, P345, DOI 10.1007/BF01939404
[2]  
[Anonymous], 1998, HKUSTCS9801
[3]  
[Anonymous], 2012, ART SOFTWARE TESTING, DOI DOI 10.1002/9781119202486
[4]  
[Anonymous], 2006, Electronic Journal of Differential equations
[5]   Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators [J].
Barone, Lindsay ;
Williams, Jason ;
Micklos, David .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
[6]  
Biczok R, 2017, BIORXIV
[7]  
Briand L. C., 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002), P345, DOI 10.1109/ICSE.1999.841025
[8]   Exploring the relationships between design measures and software quality in object-oriented systems [J].
Briand, LC ;
Wüst, J ;
Daly, JW ;
Porter, DV .
JOURNAL OF SYSTEMS AND SOFTWARE, 2000, 51 (03) :245-273
[9]   Assert Use in GitHub Projects [J].
Casalnuovo, Casey ;
Devanbu, Prem ;
Oliveira, Abilio ;
Filkov, Vladimir ;
Ray, Baishakhi .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :755-766
[10]   An innovative approach for testing bioinformatics programs using metamorphic testing [J].
Chen, Tsong Yueh ;
Ho, Joshua W. K. ;
Liu, Huai ;
Xie, Xiaoyuan .
BMC BIOINFORMATICS, 2009, 10