Matching Methods for Observational Studies Derived from Large Administrative Databases

被引:16
作者
Yu, Ruoqi [1 ]
Silber, Jeffrey H. [2 ]
Rosenbaum, Paul R. [1 ]
机构
[1] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Pediat, Perelman Sch Med, Philadelphia, PA 19104 USA
关键词
Causal inference; fine balance; Glover's algorithm; observational study; optimal caliper; optimal matching; propensity score; FINE BALANCE; BIAS;
D O I
10.1214/19-STS699
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose new optimal matching techniques for large administrative data sets. In current practice, very large matched samples are constructed by subdividing the population and solving a series of smaller problems, for instance, matching men to men and separately matching women to women. Without simplification of some kind, the time required to optimally match T treated individuals to T controls selected from C >= T potential controls grows much faster than linearly with the number of people to be matched-the required time is of order O{(T + C)(3)}-so splitting one large problem into many small problems greatly accelerates the computations. This common practice has several disadvantages that we describe. In its place, we propose a single match, using everyone, that accelerates the computations in a different way. In particular, we use an iterative form of Glover's algorithm for a doubly convex bipartite graph to determine an optimal caliper for the propensity score, radically reducing the number of candidate matches; then we optimally match in a large but much sparser graph. In this graph, a modified form of near-fine balance can be used on a much larger scale, improving its effectiveness. We illustrate the method using data from US Medicaid, matching children receiving surgery at a children's hospital to similar children receiving surgery at a hospital that mostly treats adults. In the example, we form 38,841 matched pairs from 159,527 potential controls, controlling for 29 covariates plus 463 Principal Surgical Procedures, plus 973 Principal Diagnoses. The method is implemented in an R package bigmatch available from CRAM.
引用
收藏
页码:338 / 355
页数:18
相关论文
共 27 条
  • [1] Bertsekas D. P., 1988, Annals of Operations Research, V13, P125, DOI 10.1007/BF02288322
  • [2] Bertsekas D. P., 1998, Network optimization: continuous and discrete models
  • [3] A NEW ALGORITHM FOR THE ASSIGNMENT PROBLEM
    BERTSEKAS, DP
    [J]. MATHEMATICAL PROGRAMMING, 1981, 21 (02) : 152 - 171
  • [4] COCHRAN WG, 1973, SANKHYA SER A, V35, P417
  • [5] MAXIMUM MATCHING IN A CONVEX BIPARTITE GRAPH
    GLOVER, F
    [J]. NAVAL RESEARCH LOGISTICS QUARTERLY, 1967, 14 (03): : 313 - &
  • [6] Three-sided hypothesis testing: Simultaneous testing of superiority, equivalence and inferiority
    Goeman, Jelle J.
    Solari, Aldo
    Stijnen, Theo
    [J]. STATISTICS IN MEDICINE, 2010, 29 (20) : 2117 - 2125
  • [7] The prognostic analogue of the propensity score
    Hansen, Ben B.
    [J]. BIOMETRIKA, 2008, 95 (02) : 481 - 488
  • [8] Optimal full matching and related designs via network flows
    Hansen, Ben B.
    Klopfer, Stephanie Olsen
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2006, 15 (03) : 609 - 627
  • [9] Korte B., 2012, Algorithms and Combinatorics. Theory and algorithms, V21, pxx, DOI [10.1007/978-3-642-24488-9, DOI 10.1007/978-3-642-24488-9]
  • [10] LIPSKI W, 1981, ACTA INFORM, V15, P329, DOI 10.1007/BF00264533