Imputation in High-Dimensional Economic Data as Applied to the Agricultural Resource Management Survey

被引:13
作者
Robbins, Michael W. [1 ]
Ghosh, Sujit K. [2 ]
Habiger, Joshua D. [3 ]
机构
[1] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
[2] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[3] Oklahoma State Univ, Dept Stat, Stillwater, OK 74078 USA
关键词
ARMS; Gaussian copula; Imputation; Markov chain Monte Carlo; Missing data; MULTIPLE IMPUTATION; MISSING DATA; MULTIVARIATE DATA; MODEL;
D O I
10.1080/01621459.2012.734158
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we consider imputation in the USDA's Agricultural Resource Management Survey (ARMS) data, which is a complex, high-dimensional economic dataset. We develop a robust joint model for ARMS data, which requires that variables are transformed using a suitable class of marginal densities (e.g., skew normal family). We assume that the transformed variables may be linked through a Gaussian copula, which enables construction of the joint model via a sequence of conditional linear models. We also discuss the criteria used to select the predictors for each conditional model. For the purpose of developing an imputation method that is conducive to these model assumptions, we propose a regression-based technique that allows for flexibility in the selection of conditional models while providing a valid joint distribution. In this procedure, labeled as iterative sequential regression (ISR), parameter estimates and imputations are obtained using a Markov chain Monte Carlo sampling method. Finally, we apply the proposed method to the full ARMS data, and we present a thorough data analysis that serves to gauge the appropriateness of the resulting imputations. Our results demonstrate the effectiveness of the proposed algorithm and illustrate the specific deficiencies of existing methods. Supplementary materials for this article are available online.
引用
收藏
页码:81 / 95
页数:15
相关论文
共 46 条
[1]  
[Anonymous], 1999, 99054 TNOVGZPG
[2]  
[Anonymous], 2000, SURV METHODOL
[3]  
AZZALINI A, 1985, SCAND J STAT, V12, P171
[4]  
Banker D., 2007, ARMS PHASE 3 DATA PR
[5]  
BEATON AE, 1964, RB6451 ED TEST SERV
[6]   DIVERSIFICATIONS EFFECT ON FIRM VALUE [J].
BERGER, PG ;
OFEK, E .
JOURNAL OF FINANCIAL ECONOMICS, 1995, 37 (01) :39-65
[7]   Multiple Imputation for Missing Data via Sequential Regression Trees [J].
Burgette, Lane F. ;
Reiter, Jerome P. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) :1070-1076
[8]   MULTIPLE IMPUTATION OF INDUSTRY AND OCCUPATION CODES IN CENSUS PUBLIC-USE SAMPLES USING BAYESIAN LOGISTIC-REGRESSION [J].
CLOGG, CC ;
RUBIN, DB ;
SCHENKER, N ;
SCHULTZ, B ;
WEIDMAN, L .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1991, 86 (413) :68-78
[9]  
Earp M. S., 2006, TECHNICAL REPORT
[10]  
Fox J, 2011, An {R} companion to applied regression, Vsecond