tableone: An open source Python']Python package for producing summary statistics for research papers

被引:110
作者
Pollard, Tom J. [1 ]
Johnson, Alistair E. W. [1 ]
Raffa, Jesse D. [1 ]
Mark, Roger G. [1 ]
机构
[1] MIT, Lab Computat Physiol, 77 Massachusetts Ave, Cambridge, MA 02139 USA
基金
美国国家卫生研究院;
关键词
descriptive statistics; !text type='python']python[!/text; quantitative research;
D O I
10.1093/jamiaopen/ooy012
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers. Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged. Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data. Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.
引用
收藏
页码:26 / 31
页数:6
相关论文
共 30 条
  • [1] [Anonymous], 2018, N Engl J Med, V378, P782, DOI 10.1056/NEJMx180005
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] Das KK, 2008, INDIAN J MED RES, V128, P412
  • [4] BIOSTATISTICS - HOW TO DETECT, CORRECT AND PREVENT ERRORS IN THE MEDICAL LITERATURE
    GLANTZ, SA
    [J]. CIRCULATION, 1980, 61 (01) : 1 - 7
  • [5] Statistical reviewing policies of medical journals - Caveat lector?
    Goodman, SN
    Altman, DG
    George, SL
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 1998, 13 (11) : 753 - 756
  • [6] THE DIP TEST OF UNIMODALITY
    HARTIGAN, JA
    HARTIGAN, PM
    [J]. ANNALS OF STATISTICS, 1985, 13 (01) : 70 - 84
  • [7] HOLM S, 1979, SCAND J STAT, V6, P65
  • [8] Johnson AEW, P MACHINE LEARNING H, V68
  • [9] The MIMIC Code Repository: enabling reproducibility in critical care research
    Johnson, Alistair E. W.
    Stone, David J.
    Celi, Leo A.
    Pollard, Tom J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (01) : 32 - 39
  • [10] MIMIC-III, a freely accessible critical care database
    Johnson, Alistair E. W.
    Pollard, Tom J.
    Shen, Lu
    Lehman, Li-wei H.
    Feng, Mengling
    Ghassemi, Mohammad
    Moody, Benjamin
    Szolovits, Peter
    Celi, Leo Anthony
    Mark, Roger G.
    [J]. SCIENTIFIC DATA, 2016, 3