A Novel Committee-Based Clustering Method

被引:1
作者
Fiol-Gonzalez, Sonia [1 ]
Almeida, Cassio [1 ,2 ]
Barbosa, Simone [1 ]
Lopes, Helio [1 ]
机构
[1] Pontificia Univ Catolica Rio de Janeiro, Dept Informat, Rio De Janeiro, Brazil
[2] Inst Brasileiro Geog & Estat, ENCE, Rio De Janeiro, Brazil
来源
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018) | 2018年 / 11031卷
关键词
Feature selection; Clustering methods; Similarity matrix; Ensemble methods; Unsupervised learning; COMBINING MULTIPLE CLUSTERINGS; FEATURE-SELECTION;
D O I
10.1007/978-3-319-98539-8_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well recognized that clustering algorithms play an important role in data analysis. For a successful application of these algorithms, it is crucial to determine the relevant features in the original dataset. To deal with this problem there are efficient techniques for feature selection in the literature. Moreover, it is also well known that, in the clustering task, it is also difficult to define an adequate number of clusters. This paper proposes a new ensemble clustering method that is comprised of three stages: the first generates the clustering ensemble, the second combines the results of the multiple clustering scenarios generated, and the last one creates a new partition using the combined data. To generate the clustering ensemble, the method combines feature selection strategies and clustering with various numbers of clusters to produce a similarity matrix. This similarity matrix is then used to compute the final clustering output. Experiments performed using seven well known datasets showed the effectiveness of the proposed technique.
引用
收藏
页码:126 / 136
页数:11
相关论文
共 29 条
  • [1] [Anonymous], 1990, Introduction to statistical pattern recognition
  • [2] Local Feature Selection for Data Classification
    Armanfard, Narges
    Reilly, James P.
    Komeili, Majid
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (06) : 1217 - 1227
  • [3] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [4] Cai D, 2010, P 16 ACM SIGKDD INT, P333, DOI DOI 10.1145/1835804.1835848
  • [5] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
  • [6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [7] Extraction of independent discriminant features for data with asymmetric distribution
    Dhir, Chandra Shekhar
    Lee, Jaehyung
    Lee, Soo-Young
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (02) : 359 - 375
  • [8] Doak J., 1992, EVALUATION FEATURE S
  • [9] Efficient greedy feature selection for unsupervised learning
    Farahat, Ahmed K.
    Ghodsi, Ali
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (02) : 285 - 310
  • [10] Fern Xiaoli Zhang, 2003, P 20 INT C MACH LEAR, P186, DOI DOI 10.5555/3041838.3041862