Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data

被引:16
作者
Zhang, Yiqun [1 ]
Cheung, Yiu-Ming [2 ]
机构
[1] Guangdong Univ Technol, Sch Comp Sci & Technol, Guangzhou 510006, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
关键词
Cluster analysis; dissimilarity measure; graph space; heterogeneous attributes; representation; EARTH MOVERS DISTANCE; SIMILARITY; ALGORITHM;
D O I
10.1109/TNNLS.2022.3202700
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Heterogeneous attribute data composed of attributes with different types of values are quite common in a variety of real-world applications. As data annotation is usually expensive, clustering has provided a promising way for processing unlabeled data, where the adopted similarity measure plays a key role in determining the clustering accuracy. However, it is a very challenging task to appropriately define the similarity between data objects with heterogeneous attributes because the values from heterogeneous attributes are generally with very different characteristics. Specifically, numerical attributes are with quantitative values, while categorical attributes are with qualitative values. Furthermore, categorical attributes can be categorized into nominal and ordinal ones according to the order information of their values. To circumvent the awkward gap among the heterogeneous attributes, this article will propose a new dissimilarity metric for cluster analysis of such data. We first study the connections among the heterogeneous attributes and build graph representations for them. Then, a metric is proposed, which computes the dissimilarities between attribute values under the guidance of the graph structures. Finally, we develop a new k-means-type clustering algorithm associated with this proposed metric. It turns out that the proposed method is competent to perform cluster analysis of datasets composed of an arbitrary combination of numerical, nominal, and ordinal attributes. Experimental results show its efficacy in comparison with its counterparts.
引用
收藏
页码:6530 / 6544
页数:15
相关论文
共 44 条
[31]  
Santos JM, 2009, LECT NOTES COMPUT SC, V5769, P175, DOI 10.1007/978-3-642-04277-5_18
[32]  
van der Maaten L, 2008, J MACH LEARN RES, V9, P2579
[33]   Urban flooding risk assessment based on an integrated k-means cluster algorithm and improved entropy weight method in the region of Haikou, China [J].
Xu, Hongshi ;
Ma, Chao ;
Lian, Jijian ;
Xu, Kui ;
Chaima, Evance .
JOURNAL OF HYDROLOGY, 2018, 563 :975-986
[34]   Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce [J].
Xu, Jia ;
Lei, Bin ;
Gu, Yu ;
Winslett, Marianne ;
Yu, Ge ;
Zhang, Zhenjie .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) :2148-2162
[35]   Heterogeneous Graph Representation Learning With Relation Awareness [J].
Yu, Le ;
Sun, Leilei ;
Du, Bowen ;
Liu, Chuanren ;
Lv, Weifeng ;
Xiong, Hui .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) :5935-5947
[36]  
Zhang H, 2021, NAT COMMUN, V12, DOI [10.1038/s41467-021-24438-5, 10.1038/s41467-021-25006-7]
[37]  
Zhang YQ, 2020, AAAI CONF ARTIF INTE, V34, P6869
[38]  
Zhang YQ, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P3758
[39]   A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering [J].
Zhang, Yiqun ;
Cheung, Yiu-Ming .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) :758-771
[40]   Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes [J].
Zhang, Yiqun ;
Cheung, Yiu-ming .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) :3560-3576