Adaptive kernel fuzzy clustering for missing data

被引:3
|
作者
Rodrigues, Anny K. G. [1 ]
Ospina, Raydonal [1 ]
Ferreira, Marcelo R. P. [2 ]
机构
[1] Univ Fed Pernambuco, CCEN, Dept Estat, CASTLab, Recife, PE, Brazil
[2] Univ Fed Paraiba, Ctr Ciencias Exatas & Nat, Dept Estat, DataLab, Joao Pessoa, Paraiba, Brazil
来源
PLOS ONE | 2021年 / 16卷 / 11期
关键词
MULTIPLE IMPUTATION; ALGORITHM; FRAMEWORK; VALUES;
D O I
10.1371/journal.pone.0259266
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] Estimating Missing Value in Microarray Data Using Fuzzy Clustering and Gene Ontology
    Mohammadi, Azadeh
    Saraee, Mohammad Hossein
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 382 - 385
  • [32] Missing data imputation by nearest-neighbor trained BP for fuzzy clustering
    Zhang, Li, 1600, Binary Information Press (11):
  • [33] Data reducing algorithm of support vector machine based on fuzzy kernel clustering
    Wang, Fang
    Yang, Hui-Zhong
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2007, 28 (SUPPL. 1): : 185 - 188
  • [34] Fuzzy c-means clustering for data with tolerance using kernel functions
    Kanzawa, Yuchi
    Endo, Yasunori
    Miyamoto, Sadaaki
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 744 - +
  • [35] Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
    Behnam Tavakkol
    Youngdoo Son
    Pattern Analysis and Applications, 2021, 24 : 1287 - 1302
  • [36] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Halder, Bohnishikha
    Ahmed, Md Manjur
    Amagasa, Toshiyuki
    Isa, Nor Ashidi Mat
    Faisal, Rahat Hossain
    Rahman, Md Mostafijur
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5561 - 5583
  • [37] Fuzzy kernel K-medoids clustering algorithm for uncertain data objects
    Tavakkol, Behnam
    Son, Youngdoo
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1287 - 1302
  • [38] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Bohnishikha Halder
    Md Manjur Ahmed
    Toshiyuki Amagasa
    Nor Ashidi Mat Isa
    Rahat Hossain Faisal
    Md. Mostafijur Rahman
    Applied Intelligence, 2022, 52 : 5561 - 5583
  • [39] Locally adaptive multiple kernel clustering
    Zhang, Lujiang
    Hu, Xiaohui
    NEUROCOMPUTING, 2014, 137 : 192 - 197
  • [40] Kernel Density Estimation with Missing Data: Misspecifying the Missing Data Mechanism
    Dubnicka, Suzanne R.
    NONPARAMETRIC STATISTICS AND MIXTURE MODELS: A FESTSCHRIFT IN HONOR OF THOMAS P HETTMANSPERGER, 2011, : 114 - 135