A new robust ratio estimator by modified Cook’s distance for missing data imputation

被引:0
作者
Masayoshi Takahashi
机构
[1] Nagasaki University,School of Information and Data Sciences
来源
Japanese Journal of Statistics and Data Science | 2022年 / 5卷
关键词
Ratio imputation; Ratio estimator; Missing; Outlier; Robust;
D O I
暂无
中图分类号
学科分类号
摘要
In survey data, missing values are prevalent. In official economic statistics, where data are obtained through surveys, ratio imputation is often utilized to deal with missing data; however, outliers may have an influence on the imputation model. The objective of this article is to propose a new robust ratio estimator, named the TC-ratio estimator (ratio estimator with trimming based on Cook’s distance), which is robust against outliers on the vertical axis (variable y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}), on the horizontal axis (variable x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}), and on both axes (x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}), for missing data imputation. Also, a novel way is suggested to automatically determine the number of outliers. To assess the performance of the new method, Monte Carlo simulations are conducted under 160 different data generation processes, each repeated in 10,000 simulation runs. Relative superiority of the new method is shown against the traditional robust ratio imputation methods, such as the ratio of medians, trimmed means, Winsorized means, and means by M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M$$\end{document}-estimators. The current study finds that the new method outperforms these traditional methods when outliers are present only in y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}, only in x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}, and both in x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}. Furthermore, when outliers are not present, the performance of this new method is approximately equal to the non-robust method.
引用
收藏
页码:783 / 830
页数:47
相关论文
共 95 条
  • [21] Jones CM(1970)On finite population sampling theory under certain linear regression models Biometrika 57 147-177
  • [22] Pollock KH(2002)Missing data: Our view of the state of the art Psychological Methods 7 924-933
  • [23] Robson DS(2006)Multiple imputation of missing income data in the national health interview survey Journal of the American Statistical Association 101 315-319
  • [24] Wade DL(2005)Multiple imputation: How it began and continues The American Statistician 59 12613-12622
  • [25] King G(2019)Comparing survey and multiple recruitment-mortality models to assess growth rates and population projections Ecology and Evolution 9 61-73
  • [26] Honaker J(1997)Imputation for missing values and corresponding variance estimation The Canadian Journal of Statistics 25 54-62
  • [27] Joseph A(1992)Ratio methods for estimating forest biomass New Zealand Journal of Forestry Science 22 255-267
  • [28] Scheve K(2019)The utility of spatial model-based estimators of unobserved bycatch ICES Journal of Marine Science 76 630-656
  • [29] Lawrance AJ(2017)Multiple ratio imputation by the EMB algorithm: Theory and simulation Journal of Modern Applied Statistical Methods 16 657-673
  • [30] Lee H(2017)Implementing multiple ratio imputation by the EMB algorithm (R) Journal of Modern Applied Statistical Methods 16 1-17