Missing Values and Directional Outlier Detection in Model-Based Clustering

被引:1
|
作者
Tong, Hung [1 ]
Tortora, Cristina [2 ]
机构
[1] Univ Alabama, Tuscaloosa, AL 35487 USA
[2] San Jose State Univ, San Jose, CA 95192 USA
基金
美国国家科学基金会;
关键词
Model-based clustering; Outliers; Missing data; Contaminated normal distribution; Multiple scaled distributions; EM algorithm; MAXIMUM-LIKELIHOOD-ESTIMATION; MIXTURE-MODELS; PARSIMONIOUS MIXTURES; DISCRIMINANT-ANALYSIS; SIMULATING DATA; INCOMPLETE DATA; EM ALGORITHM; R PACKAGE; MULTIVARIATE; SELECTION;
D O I
10.1007/s00357-023-09450-2
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Model-based clustering tackles the task of uncovering heterogeneity in a data set to extract valuable insights. Given the common presence of outliers in practice, robust methods for model-based clustering have been proposed. However, the use of many methods in this area becomes severely limited in applications where partially observed records are common since their existing frameworks often assume complete data only. Here, a mixture of multiple scaled contaminated normal (MSCN) distributions is extended using the expectation-conditional maximization (ECM) algorithm to accommodate data sets with values missing at random. The newly proposed extension preserves the mixture's capability in yielding robust parameter estimates and performing automatic outlier detection separately for each principal component. In this fitting framework, the MSCN marginal density is approximated using the inversion formula for the characteristic function. Extensive simulation studies involving incomplete data sets with outliers are conducted to evaluate parameter estimates and to compare clustering performance and outlier detection of our model to other mixtures.
引用
收藏
页码:480 / 513
页数:34
相关论文
共 50 条
  • [1] Model-based clustering and outlier detection with missing data
    Tong, Hung
    Tortora, Cristina
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 5 - 30
  • [2] Model-based clustering and outlier detection with missing data
    Hung Tong
    Cristina Tortora
    Advances in Data Analysis and Classification, 2022, 16 : 5 - 30
  • [3] A Model-based Approach for Text Clustering with Outlier Detection
    Yin, Jianhua
    Wang, Jianyong
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 625 - 636
  • [4] SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values
    Maugis-Rabusseau, Cathy
    Martin-Magniette, Marie-Laure
    Pelletier, Sandra
    JOURNAL OF THE SFDS, 2012, 153 (02): : 21 - 36
  • [5] Model-based clustering of multivariate skew data with circular components and missing values
    Lagona, Francesco
    Picone, Marco
    JOURNAL OF APPLIED STATISTICS, 2012, 39 (05) : 927 - 945
  • [6] k-means clustering with outlier detection, mixed variables and missing values
    Wishart, D
    EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 216 - 226
  • [7] Model-based clustering with missing not at random data
    Sportisse, Aude
    Marbac, Matthieu
    Laporte, Fabien
    Celeux, Gilles
    Boyer, Claire
    Josse, Julie
    Biernacki, Christophe
    STATISTICS AND COMPUTING, 2024, 34 (04)
  • [8] Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
    Huang, Min-Wei
    Lin, Wei-Chao
    Tsai, Chih-Fong
    JOURNAL OF HEALTHCARE ENGINEERING, 2018, 2018
  • [9] On Model-Based Clustering of Directional Data with Heavy Tails
    Yingying Zhang
    Volodymyr Melnykov
    Igor Melnykov
    Journal of Classification, 2023, 40 (3) : 527 - 551
  • [10] On Model-Based Clustering of Directional Data with Heavy Tails
    Zhang, Yingying
    Melnykov, Volodymyr
    Melnykov, Igor
    JOURNAL OF CLASSIFICATION, 2023, 40 (03) : 527 - 551