On Model-Based Clustering of Directional Data with Heavy Tails

被引:0
作者
Yingying Zhang
Volodymyr Melnykov
Igor Melnykov
机构
[1] Western Michigan University,Department of Statistics
[2] University of Alabama,Department of Information Systems, Statistics, and Management Science
[3] University of Minnesota Duluth,Department of Mathematics and Statistics
关键词
EM algorithm; Directional data; Von Mises-Fisher distribution; Mixture model;
D O I
暂无
中图分类号
学科分类号
摘要
Directional statistics deals with data that can be naturally expressed in the form of vector directions. The von Mises-Fisher distribution is one of the most fundamental parametric models to describe directional data. Mixtures of von Mises-Fisher distributions represent a popular approach to handling heterogeneous populations. However, components of such models can be affected by the presence of mild outliers or cluster tails heavier than what can be accommodated by means of a von Mises-Fisher distribution. To relax these model limitations, a mixture of contaminated von Mises-Fisher distributions is proposed. The performance of the proposed methodology is tested on synthetic data and applied to text and genetics data. The obtained results demonstrate the importance of the proposed procedure and its superiority over the traditional mixture of von Mises-Fisher distributions in the presence of heavy tails.
引用
收藏
页码:527 / 551
页数:24
相关论文
共 120 条
[1]  
Banerjee A(2005)Clustering on the unit hypersphere using von Mises-Fisher distributions Journal of Machine Learning Research 6 1345-1382
[2]  
Dhillon IS(2020)Review of outlier detection and identifying using robust regression model International Journal of Systems Science and Applied Mathematics 5 4-11
[3]  
Ghosh J(2006)Graphical models and directional statistics capture protein structure Interdisciplinary Statistics and Bioinformatics 25 91-94
[4]  
Sra S(2009)Statistical challenges in the analysis of cosmic microwave background radiation The Annals of Applied Statistics 3 61-95
[5]  
Begashaw GB(2012)Multivariate mixture modelling using skew-normal independent distributions Computational Statistics & Data Analysis 56 126-142
[6]  
Yohannes YB(2010)Sparse partial least squares regression for simultaneous dimension reduction and variable selection Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 3-25
[7]  
Boomsma W(2015)Mixtures of multivariate power exponential distributions Biometrics 71 1081-1089
[8]  
Kent JT(1977)An efficient algorithm for a complete link method The Computer Journal 20 364-366
[9]  
Mardia KV(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39 1-22
[10]  
Taylor CC(2001)Concept decompositions for large sparse text data using clustering Machine Learning 42 143-175