Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies

被引:2810
作者
Austin, Peter C. [1 ,2 ,3 ]
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth Sci, Toronto, ON M5S 1A1, Canada
[3] Univ Toronto, Dept Hlth Management Policy & Evaluat, Toronto, ON M5S 1A1, Canada
基金
加拿大健康研究院;
关键词
propensity score; observational study; binary data; risk difference; propensity-score matching; Monte Carlo simulations; bias; matching; MONTE-CARLO; ODDS RATIOS; PERFORMANCE; TREAT; RISK;
D O I
10.1002/pst.433
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:150 / 161
页数:12
相关论文
共 29 条
[1]   Effects and non-effects of paired identical observations in comparing proportions with binary matched-pairs data [J].
Agresti, A ;
Min, YY .
STATISTICS IN MEDICINE, 2004, 23 (01) :65-75
[2]  
[Anonymous], 2004, 1 I CLIN EV SCI
[3]  
[Anonymous], 1988, STAT POWER ANAL BEHA
[4]   A comparison of propensity score methods: A case-study estimating the effectiveness of post-AMI statin use [J].
Austin, PC ;
Mamdani, MM .
STATISTICS IN MEDICINE, 2006, 25 (12) :2084-2106
[5]   The performance of two data-generation processes for data with specified marginal treatment odds ratios [J].
Austin, Peter C. ;
Stafford, James .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (06) :1039-1051
[6]  
Austin PC, 2008, STAT MED, V27, P2037, DOI 10.1002/sim.3150
[7]   Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: A systematic review and suggestions for improvement [J].
Austin, Peter C. .
JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY, 2007, 134 (05) :1128-U7
[8]   The performance of different propensity score methods for estimating marginal odds ratios [J].
Austin, Peter C. .
STATISTICS IN MEDICINE, 2007, 26 (16) :3078-3094
[9]   Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: A Monte Carlo study (p n/a) [J].
Austin, Peter C. ;
Grootendorst, Paul ;
Normand, Sharon-Lise T. ;
Anderson, Geoffrey M. .
STATISTICS IN MEDICINE, 2007, 26 (16) :3208-3210
[10]   The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies [J].
Austin, Peter C. .
STATISTICS IN MEDICINE, 2010, 29 (20) :2137-2148