Robust covariance matrix estimation and identification of unusual data points: New tools

被引:3
作者
Garciga, Christian [1 ]
Verbrugge, Randal [1 ,2 ]
机构
[1] Fed Reserve Bank Cleveland, 1455 E 6th St, Cleveland, OH 44114 USA
[2] NBER, CRIW, 1455 E 6th St, Cleveland, OH 44114 USA
关键词
Outlier identification; Fragility; Robust estimation; detMCD; RMVN; VARIANCE-ESTIMATION NNVE; FAST-FOOD INDUSTRY; OUTLIER DETECTION; MINIMUM-WAGES; NEW-JERSEY; REGRESSION; EMPLOYMENT; PENNSYLVANIA; EFFICIENCY; ALGORITHM;
D O I
10.1016/j.rie.2021.03.001
中图分类号
F [经济];
学科分类号
02 ;
摘要
Most consistent estimators are prone to total breakdown in the presence of a handful of unusual data points (UDPs). This compromises inference. Robust estimation is a (seldom-used) solution; but methods commonly-used in applied research have severe drawbacks. In this paper, building upon methods that are relatively unknown outside of the robust statistics literature, we provide an enhanced tool for robust estimates of mean and co-variance, useful both for robust estimation and for detection of unusual data points. It is relatively fast and useful for large data sets. We also provide a new robust cluster method, an input to our broader method, but also useful for standalone UDP detection or cluster analysis. We provide a comparative study of numerous methods that is not available in the current literature. Testing indicates that our method performs at par with, and often better than, two of the currently best available methods. We also demonstrate that the issues we discuss are not merely hypothetical, by applying our tools to real world data, and to re-examine two prominent economic studies. Our methods reveal that their central results are driven by a set of unusual points. (C) 2021 University of Venice. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:176 / 202
页数:27
相关论文
共 68 条
[31]  
Hampel Frank R, 2011, Robust Statistics: The Approach Based on Influence Functions, V196
[32]  
Hampel FrankR., 1975, P 40 SESSION INT STA, V46, P375, DOI DOI 10.1016/0370-2693(74)90750-3
[33]   The distribution of robust distances [J].
Hardin, J ;
Rocke, DM .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (04) :928-946
[34]   Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm - Rejoinder [J].
Hawkins, DM ;
Olive, DJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :156-159
[35]  
Hossjer O., 1991, 19915 UPPS U DEP MAT, DOI [10.1080/01621459.1994.10476456, DOI 10.1080/01621459.1994.10476456]
[36]  
Hu T., 2004, Intelligent Data Analysis, V8, P79
[37]   A Deterministic Algorithm for Robust Location and Scatter [J].
Hubert, Mia ;
Rousseeuw, Peter J. ;
Verdonck, Tim .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2012, 21 (03) :618-637
[38]  
Janson W., 2020, IMPROVING INFE UNPUB
[39]   On the robustness of size and book-to-market in cross-sectional regressions [J].
Knez, PJ ;
Ready, MJ .
JOURNAL OF FINANCE, 1997, 52 (04) :1355-1382
[40]  
Maronna Ricardo, 2006, Robust statistics: theory and methods