Using model-assisted calibration methods to improve efficiency of regression analyses using two-phase samples or pooled samples under complex survey designs
被引:0
作者:
Wang, Lingxiao
论文数: 0引用数: 0
h-index: 0
机构:
Univ Virginia, Dept Stat, 148 Amphitheater Way, Charlottesville, VA 22903 USA
NCI, Biostat Branch, Div Canc Epidemiol & Genet, Rockville, MD 20850 USAUniv Virginia, Dept Stat, 148 Amphitheater Way, Charlottesville, VA 22903 USA
Wang, Lingxiao
[1
,2
]
机构:
[1] Univ Virginia, Dept Stat, 148 Amphitheater Way, Charlottesville, VA 22903 USA
[2] NCI, Biostat Branch, Div Canc Epidemiol & Genet, Rockville, MD 20850 USA
calibration;
complex survey data analysis;
data integration;
regression analysis;
two-phase design;
COHORT;
RISK;
ESTIMATORS;
GLUCOSE;
DISEASE;
D O I:
10.1093/biomtc/ujaf092
中图分类号:
Q [生物科学];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Two-phase sampling designs are frequently applied in epidemiological studies and large-scale health surveys. In such designs, certain variables are collected exclusively within a second-phase random subsample of the initial first-phase sample, often due to factors such as high costs, response burden, or constraints on data collection or assessment. Consequently, second-phase sample estimators can be inefficient due to the diminished sample size. Model-assisted calibration methods have been used to improve the efficiency of second-phase estimators in regression analysis. However, limited literature provides valid finite population inferences of the calibration estimators that use appropriate calibration auxiliary variables while simultaneously accounting for the complex sample designs in the first- and second-phase samples. Moreover, no literature considers the "pooled design" where some covariates are measured exclusively in certain repeated survey cycles. This paper proposes calibrating the sample weights for the second-phase sample to the weighted first-phase sample based on score functions of the regression model that uses predictions of the second-phase variable for the first-phase sample. We establish the consistency of estimation using calibrated weights and provide variance estimation for the regression coefficients under the two-phase design or the pooled design nested within complex survey designs. Empirical evidence highlights the efficiency and robustness of the proposed calibration compared to existing calibration and imputation methods. Data examples from the National Health and Nutrition Examination Survey are provided.