High Dimensional Sparse data Clustering Algorithm Based on Concept Feature Vector (CABOCFV)

被引:0
|
作者
Wu, Sen [1 ]
Gu, Shujuan [1 ]
Gao, Xuedong [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Econ & Management, Beijing 100083, Peoples R China
来源
IEEE/SOLI'2008: PROCEEDINGS OF 2008 IEEE INTERNATIONAL CONFERENCE ON SERVICE OPERATIONS AND LOGISTICS, AND INFORMATICS, VOLS 1 AND 2 | 2008年
关键词
Clustering Analysis; High Dimensional Data; Concept Lattice Construction;
D O I
10.1109/SOLI.2008.4686391
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Finding clusters of data objects in high dimensional pace is challenging, especially considering that such data can be parse and highly skewed. This paper focuses on using Concept Lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data Clustering Algorithm Based On Concept Feature Vector (CABOCFV), which reduces the redundancy of concept construction using 'Concept Sparse Feature Distance' and 'Concept Feature Vector', and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.
引用
收藏
页码:202 / 206
页数:5
相关论文
共 50 条
  • [41] Sign-based Test for Mean Vector in High-dimensional and Sparse Settings
    Liu, Wei
    Li, Ying Qiu
    ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2020, 36 (01) : 93 - 108
  • [42] Sign-based Test for Mean Vector in High-dimensional and Sparse Settings
    Wei Liu
    Ying Qiu Li
    Acta Mathematica Sinica, English Series, 2020, 36 : 93 - 108
  • [43] Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Zimek, Arthur
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
  • [44] Enhancing grid-density based clustering for high dimensional data
    Zhao, Yanchang
    Cao, Jie
    Zhang, Chengqi
    Zhang, Shichao
    JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (09) : 1524 - 1539
  • [45] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [46] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (04) : 1651 - 1663
  • [47] ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
    Kavan Fatehi
    Mohsen Rezvani
    Mansoor Fateh
    Pattern Analysis and Applications, 2020, 23 : 1651 - 1663
  • [48] Feature selection based on geometric distance for high-dimensional data
    Lee, J. -H.
    Oh, S. -Y.
    ELECTRONICS LETTERS, 2016, 52 (06) : 473 - 474
  • [49] FACO: A Novel Hybrid Feature Selection Algorithm for High-Dimensional Data Classification
    Popoola, Gideon
    Oyeniran, Kayode
    SOUTHEASTCON 2024, 2024, : 61 - 68
  • [50] Adaptive threshold-based classification of sparse high-dimensional data
    Pavlenko, Tatjana
    Stepanova, Natalia
    Thompson, Lee
    ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 1952 - 1996