Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms

被引:0
作者
Andersen, Hayden [1 ]
Lensen, Andrew [1 ]
Xue, Bing [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
来源
2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021) | 2021年
关键词
Clustering; Genetic Programming; Similarity Function; Feature Selection;
D O I
10.1109/CEC45853.2021.9504855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is the process of grouping related instances of unlabelled data into distinct subsets called clusters. While there are many different clustering methods available, almost all of them use simple distance-based (dis)similarity functions such as Euclidean Distance. However, these and most other predefined dissimilarity functions can be rather inflexible by considering each feature equally and not properly capturing feature interactions in the data. Genetic Programming is an evolutionary computation approach that evolves programs in an iterative process that naturally lends itself to the evolution of functions. This paper introduces a novel framework to automatically evolve dissimilarity measures for a provided clustering dataset and algorithm. The results show that the evolved functions create clusters exhibiting high measures of cluster quality.
引用
收藏
页码:688 / 695
页数:8
相关论文
共 28 条
[1]   A survey on evolutionary machine learning [J].
Al-Sahaf, Harith ;
Bi, Ying ;
Chen, Qi ;
Lensen, Andrew ;
Mei, Yi ;
Sun, Yanan ;
Tran, Binh ;
Xue, Bing ;
Zhang, Mengjie .
JOURNAL OF THE ROYAL SOCIETY OF NEW ZEALAND, 2019, 49 (02) :205-228
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]  
Balcan MF, 2008, ACM S THEORY COMPUT, P671
[4]   Genetic programming for multiple-feature construction on high-dimensional classification [J].
Binh Tran ;
Xue, Bing ;
Zhang, Mengjie .
PATTERN RECOGNITION, 2019, 93 :404-417
[5]   Genetic programming-based clustering using an information theoretic fitness measure [J].
Boric, Neven ;
Estevez, Pablo A. .
2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, :31-38
[6]   How Good Is The Euclidean Distance Metric For The Clustering Problem [J].
Bouhmala, Noureddine .
PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, :312-315
[7]  
Ester M., 1996, P 2 INT C KNOWLEDGE, P226, DOI DOI 10.5555/3001460.3001507
[8]  
Fortin FA, 2012, J MACH LEARN RES, V13, P2171
[9]   An evolutionary approach to multiobjective clustering [J].
Handl, Julia ;
Knowles, Joshua .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2007, 11 (01) :56-76
[10]  
Hastie T., 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer Series in Statistics, P625, DOI DOI 10.1007/978-0-387-84858-717