Transforming variables to central normality

被引:36
作者
Raymaekers, Jakob [1 ]
Rousseeuw, Peter J. [1 ]
机构
[1] Katholieke Univ Leuven, Celestijnenlaan 200B, B-3001 Leuven, Belgium
关键词
Anomaly detection; Data preprocessing; Feature transformation; Outliers; Symmetrization;
D O I
10.1007/s10994-021-05960-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box-Cox and Yeo-Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.
引用
收藏
页码:4953 / 4975
页数:23
相关论文
共 19 条
[1]  
Andrews D.F., 1972, ROBUST ESTIMATES LOC
[2]  
[Anonymous], 2020, R LANG ENV STAT COMP
[3]   AN ANALYSIS OF TRANSFORMATIONS [J].
BOX, GEP ;
COX, DR .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) :211-252
[4]  
CARROLL RJ, 1980, J ROY STAT SOC B, V42, P71
[5]  
Djorgovski S., 1998, ARXIV PREPRINT ARXIV
[6]  
Hoaglin D.C., 1983, UNDERSTANDING ROBUST
[7]  
Huber P.J., 1981, ROBUST STAT
[8]   ROBPCA: A new approach to robust principal component analysis [J].
Hubert, M ;
Rousseeuw, PJ ;
Vanden Branden, K .
TECHNOMETRICS, 2005, 47 (01) :64-79
[9]   MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as Cellwise and Rowwise Outliers [J].
Hubert, Mia ;
Rousseeuw, Peter J. ;
Van den Bossche, Wannes .
TECHNOMETRICS, 2019, 61 (04) :459-473
[10]  
Lemberge P, 2000, J CHEMOMETR, V14, P751, DOI 10.1002/1099-128X(200009/12)14:5/6<751::AID-CEM622>3.0.CO