Theoretical Analysis of the Generalization Error of the Sampling-Based Fuzzy C-Means

被引:2
|
作者
Zhang, Zhongjie [1 ]
Huang, Jian [1 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Germanium; Upper bound; Computers; Data mining; Clustering algorithms; Optimization; Picture archiving and communication systems; Fuzzy C-means (FCM); generalization error (GE); Hoeffding's inequality; probably approximately correct (PAC); sampling; VC dimension;
D O I
10.1109/TFUZZ.2020.2990100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies take sampling to apply fuzzy C-means (FCM) to very large data sets. However, comparing with the plenty algorithms for sampling-based FCM, few research works theoretically study the generalization error. In this article, we take the concept probably approximately correct learning to study it, and define the generalization error as the maximum difference between the empirical risks and the real risks of solutions in solution space. First, we analyze this generalization error under finite solution space, and prove two theorems by Hoeffding's inequality, where one of them relies on a reasonable hypothesis. Then, we discuss the situation in which the solution space is infinite, and propose a theorem and a corollary by another hypothesis, where the results under finite solution space are used in the proofs. Finally, we bound the generalization error from the perspective of the FCM algorithm's convergence, where we take Taylor expansion to transform the risk function to the linear form and estimate the upper bound of its Vapnik-Chervonenkis (VC) dimension. The hypotheses proposed in this article are all intuitive and common phenomena in practice. Our results show the upper bound of the generalization error under the given minimum probability, which can offer insight into the stability of sampling-based FCM and can guide in its application.
引用
收藏
页码:2432 / 2437
页数:6
相关论文
共 50 条
  • [1] Generalization of Fuzzy C-Means Based on Neutrosophic Logic
    Hassanien, Aboul Ella
    Basha, Sameh H.
    Abdalla, Areeg S.
    STUDIES IN INFORMATICS AND CONTROL, 2018, 27 (01): : 43 - 54
  • [2] Information Theoretical Importance Sampling Clustering and Its Relationship With Fuzzy C-Means
    Zhang, Jiangshe
    Ji, Lizhen
    Wang, Meng
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (04) : 2164 - 2175
  • [3] Generalization rules for the suppressed fuzzy c-means clustering algorithm
    Szilagyi, Laszlo
    Szilagyi, Sandor M.
    NEUROCOMPUTING, 2014, 139 : 298 - 309
  • [4] A Generalization of Fuzzy c-Means with Variables Controlling Cluster Size
    Kanzawa, Yuchi
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2023, 2023, 13890 : 226 - 237
  • [5] A Generalization of Distance Functions for Fuzzy c-Means Clustering With Centroids of Arithmetic Means
    Wu, Junjie
    Xiong, Hui
    Liu, Chen
    Chen, Jian
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (03) : 557 - 571
  • [6] Random sampling fuzzy c-means clustering and recursive least square based fuzzy identification
    Lu, Pingli
    Yang, Ying
    Ma, Wenbo
    2006 AMERICAN CONTROL CONFERENCE, VOLS 1-12, 2006, 1-12 : 5049 - +
  • [7] Parameter selections of fuzzy c-means based on robust analysis
    Wu, Kuo-Lung
    World Academy of Science, Engineering and Technology, 2010, 41 : 554 - 557
  • [8] Parameter selections of fuzzy C-means based on robust analysis
    Wu, Kuo-Lung
    World Academy of Science, Engineering and Technology, 2010, 65 : 554 - 557
  • [9] Unsupervised classification based on fuzzy c-means with uncertainty analysis
    Wang, Qunming
    Shi, Wenzhong
    REMOTE SENSING LETTERS, 2013, 4 (11) : 1087 - 1096
  • [10] A generalization of Possibilistic Fuzzy C-Means Method for Statistical Clustering of Data
    Azzouzi S.
    El-Mekkaoui J.
    Hjouji A.
    Khalfi A.E.L.
    International Journal of Circuits, Systems and Signal Processing, 2021, 15 : 1766 - 1780