Using GNU Make to Manage the Workflow of Data Analysis Projects

被引:2
|
作者
Baker, Peter [1 ]
机构
[1] Univ Queensland, Sch Publ Hlth, Herston, Qld 4006, Australia
来源
JOURNAL OF STATISTICAL SOFTWARE | 2020年 / 94卷 / CN1期
关键词
GNU Make; Make; reproducible research; R; rmarkdown; Sweave; Stata; SAS;
D O I
10.18637/jss.v094.c01
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data analysis projects invariably involve a series of steps such as reading, cleaning, summarizing and plotting data, statistical analysis and reporting. To facilitate reproducible research, rather than employing a relatively ad-hoc point-and-click cut-and-paste approach, we typically break down these tasks into manageable chunks by employing separate files of statistical, programming or text processing syntax for each step including the final report. Real world data analysis often requires an iterative process because many of these steps may need to be repeated any number of times. Manually repeating these steps is problematic in that some necessary steps may be left out or some reported results may not be for the most recent data set or syntax. GNU Make may be used to automate the mundane task of regenerating output given dependencies between syntax and data files. In addition to facilitating the management of and documenting the workflow of a complex data analysis project, such automation can help minimize errors and make the project more reproducible. It is relatively simple to construct Makefiles for small data analysis projects. As projects increase in size, difficulties arise because GNU Make does not have inbuilt rules for statistical and related software. Without such rules, Makefiles can become unwieldy and error-prone. This article addresses these issues by providing GNU Make pattern rules for R, Sweave, rmarkdown, SAS, Stata, Perl and Python to streamline management of data analysis and reporting projects. Rules are used by adding a single line to project Makefiles. Additional flexibility is incorporated for modifying standard program options. An overall strategy is outlined for Makefile construction and illustrated via simple and complex examples.
引用
收藏
页码:1 / 46
页数:46
相关论文
共 50 条
  • [1] Make It Simple - An Empirical Analysis of GNU Make Feature Use in Open Source Projects
    Martin, Douglas H.
    Cordy, James R.
    Adams, Bram
    Antoniol, Giulio
    2015 IEEE 23RD INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION ICPC 2015, 2015, : 207 - 217
  • [2] BioMake: a GNU make-compatible utility for declarative workflow management
    Holmes, Ian H.
    Mungall, Christopher J.
    BIOINFORMATICS, 2017, 33 (21) : 3502 - 3504
  • [3] Using earned-value analysis to better manage projects
    Hayes, Heather
    2002, Advanstar Communications (26):
  • [4] Using GNU make to automate the recompile of VHDL SoC designs
    McKinney, MD
    SYSTEM ON CHIP DESIGN LANGUAGES: EXTENDED PAPERS: BEST OF FDL'01 AND HDLCON'01, 2002, : 113 - 127
  • [5] USING METRICS TO MANAGE SOFTWARE PROJECTS
    WELLER, EF
    COMPUTER, 1994, 27 (09) : 27 - 33
  • [6] Using software Tools to manage hydraulics projects
    de los Angeles Suarez-Medina, Maria
    Astudillo-Enriquez, Citlalli
    TECNOLOGIA Y CIENCIAS DEL AGUA, 2013, 4 (03) : 195 - 202
  • [7] Manage Data and Analysis
    Lyons, R.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2017, 27
  • [8] Free MS data prediction and analysis with GNU polyxmass
    Jaffe, S
    SCIENTIST, 2004, 18 (18): : 38 - 38
  • [9] A practical data processing workflow for multi-OMICS projects
    Kohl, Michael
    Megger, Dominik A.
    Trippler, Martin
    Meckel, Hagen
    Ahrens, Maike
    Bracht, Thilo
    Weber, Frank
    Hoffmann, Andreas-Claudius
    Baba, Hideo A.
    Sitek, Barbara
    Schlaak, Joerg F.
    Meyer, Helmut E.
    Stephan, Christian
    Eisenacher, Martin
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2014, 1844 (01): : 52 - 62
  • [10] Using Geoweaver to Make Snow Mapping Workflow FAIR
    Alnaim , Ahmed
    Sun, Ziheng
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 409 - 410