Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes

被引：24

作者：

Zhang, Yiqun ^{[1
,2
]}

Cheung, Yiu-ming ^{[2
]}

机构：

[1] Guangdong Univ Technol, Sch Comp, Guangzhou 510080, Peoples R China

[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Clustering algorithms; Weight measurement; Measurement; Loss measurement; Encoding; Task analysis; Partitioning algorithms; Categorical data clustering; nominal-and-ordinal attribute; intra-attribute distance; learnable weighting; K-MEANS ALGORITHM; SIMILARITY;

D O I：

10.1109/TPAMI.2021.3056510

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e., nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.

引用

页码：3560 / 3576

页数：17

共 63 条

[1]

Agresti A., 2010, ANAL ORDINAL CATEGOR

[2]

Agresti A, 2003, CATEGORICAL DATA ANA

[3] A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J].

Ahmad, Amir ;

Dey, Lipika .

PATTERN RECOGNITION LETTERS, 2007, 28 (01) :110-118

[4]

Alamuri M, 2014, IEEE IJCNN, P1907, DOI 10.1109/IJCNN.2014.6889941

[5]

[Anonymous], 2005, P 18 INT C NEUR INF

[6]

[Anonymous], 2008, Similarity Measures for Categorical Data: A Comparative Evaluation, DOI [10.1137/1.9781611972788.22, DOI 10.1137/1.9781611972788.22]

[7]

[Anonymous], 1986, Matching Theory

[8]

[Anonymous], 1997, MACH LEARN

[9] A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].

Bai, Liang ;

Liang, Jiye ;

Dang, Chuangyin ;

Cao, Fuyuan .

PATTERN RECOGNITION, 2011, 44 (12) :2843-2861

[10] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].

BALL, GH ;

HALL, DJ .

BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&

← 1 2 3 4 5 6 7 →