A new robust ratio estimator by modified Cook’s distance for missing data imputation

被引:0
作者
Masayoshi Takahashi
机构
[1] Nagasaki University,School of Information and Data Sciences
来源
Japanese Journal of Statistics and Data Science | 2022年 / 5卷
关键词
Ratio imputation; Ratio estimator; Missing; Outlier; Robust;
D O I
暂无
中图分类号
学科分类号
摘要
In survey data, missing values are prevalent. In official economic statistics, where data are obtained through surveys, ratio imputation is often utilized to deal with missing data; however, outliers may have an influence on the imputation model. The objective of this article is to propose a new robust ratio estimator, named the TC-ratio estimator (ratio estimator with trimming based on Cook’s distance), which is robust against outliers on the vertical axis (variable y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}), on the horizontal axis (variable x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}), and on both axes (x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}), for missing data imputation. Also, a novel way is suggested to automatically determine the number of outliers. To assess the performance of the new method, Monte Carlo simulations are conducted under 160 different data generation processes, each repeated in 10,000 simulation runs. Relative superiority of the new method is shown against the traditional robust ratio imputation methods, such as the ratio of medians, trimmed means, Winsorized means, and means by M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M$$\end{document}-estimators. The current study finds that the new method outperforms these traditional methods when outliers are present only in y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}, only in x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}, and both in x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document}. Furthermore, when outliers are not present, the performance of this new method is approximately equal to the non-robust method.
引用
收藏
页码:783 / 830
页数:47
相关论文
共 95 条
  • [1] Bullock EL(2020)Ongoing forest disturbance in Guatemala’s protected areas Remote Sensing in Ecology and Conservation 6 141-152
  • [2] Nolte C(1977)Detection of influential observation in linear regression Technometrics 19 15-18
  • [3] Reboredo S(2013)Selective editing: A quest for efficiency and data quality Journal of Official Statistics 29 473-488
  • [4] Ana L(1994)Variance estimation for the regression imputed Horvitz-Thompson estimator Journal of Official Statistics 10 381-394
  • [5] Woodcock CE(2013)A contamination model for selective editing Journal of Official Statistics 29 539-555
  • [6] Cook RD(2003)Regression through the origin Teaching Statistics 25 76-80
  • [7] de Waal T(2006)Outlier detection and editing procedures for continuous multivariate data Journal of Official Statistics 22 487-506
  • [8] Deville J-C(2009)Missing data analysis: Making it work in the real world Annual Review of Psychology 60 549-576
  • [9] Särndal C-E(1992)Outlier resistant alternatives to the ratio estimator Journal of the American Statistical Association 87 1174-1182
  • [10] Di Zio M(2020)Variance estimation procedures in the presence of singly imputed survey data: A critical review Japanese Journal of Statistics and Data Science 3 583-623