Outlier Identification in Model-Based Cluster Analysis

被引:0
作者
Katie Evans
Tanzy Love
Sally W. Thurston
机构
[1] Dupont,Department of Biostatistics & Computational Biology
[2] DuET Applied Statistics,undefined
[3] University of Rochester,undefined
来源
Journal of Classification | 2015年 / 32卷
关键词
Normal-mixture models; Influential points; MCLUST; Prior; National Hockey League.;
D O I
暂无
中图分类号
学科分类号
摘要
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.
引用
收藏
页码:63 / 84
页数:21
相关论文
empty
未找到相关数据