Outlier Identification in Model-Based Cluster Analysis
被引:0
作者:
Katie Evans
论文数: 0引用数: 0
h-index: 0
机构:Dupont,Department of Biostatistics & Computational Biology
Katie Evans
Tanzy Love
论文数: 0引用数: 0
h-index: 0
机构:Dupont,Department of Biostatistics & Computational Biology
Tanzy Love
论文数: 引用数:
h-index:
机构:
Sally W. Thurston
机构:
[1] Dupont,Department of Biostatistics & Computational Biology
[2] DuET Applied Statistics,undefined
[3] University of Rochester,undefined
来源:
Journal of Classification
|
2015年
/
32卷
关键词:
Normal-mixture models;
Influential points;
MCLUST;
Prior;
National Hockey League.;
D O I:
暂无
中图分类号:
学科分类号:
摘要:
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.