Improved metaheuristics for the quartet method of hierarchical clustering

被引:2
作者
Consoli, Sergio [1 ]
Korst, Jan [1 ]
Pauws, Steffen [1 ,2 ]
Geleijnse, Gijs [3 ]
机构
[1] Philips Res, High Tech Campus 34, NL-5656 AE Eindhoven, Netherlands
[2] Tilburg Univ, TiCC, Warandelaan 2, NL-5037 AB Tilburg, Netherlands
[3] Netherlands Comprehens Canc Org IKNL, Zernikestr 29, NL-5612 HZ Eindhoven, Netherlands
关键词
Combinatorial optimization; NP-hardness; Quartet trees; Hierarchical clustering; Data mining; Metaheuristics; Graphs; Minimum quartet tree cost; TREES; SEARCH;
D O I
10.1007/s10898-019-00871-1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The quartet method is a novel hierarchical clustering approach where, given a set ofndata objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies onn, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.
引用
收藏
页码:241 / 270
页数:30
相关论文
共 35 条
  • [1] Aarts E., 2005, Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, P187, DOI DOI 10.1007/0-387-28356-0_7
  • [2] Aarts E., 1989, Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing
  • [3] [Anonymous], P ISWC 2006 WORKSH W
  • [4] Constructing phylogenies from quartets: Elucidation of eutherian superordinal relationships
    Ben-Dor, A
    Chor, B
    Graur, D
    Ophir, R
    Pelleg, D
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (03) : 377 - 390
  • [5] Berry V, 1999, LECT NOTES COMPUT SC, V1643, P313
  • [6] Algorithmic clustering of music based on string compression
    Cilibrasi, R
    Vitányi, P
    de Wolf, R
    [J]. COMPUTER MUSIC JOURNAL, 2004, 28 (04) : 49 - 67
  • [7] Clustering by compression
    Cilibrasi, R
    Vitányi, PMB
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (04) : 1523 - 1545
  • [8] Cilibrasi R, 2007, COMPLEARN TOOLKIT
  • [9] The Google similarity distance
    Cilibrasi, Rudi L.
    Vitanyi, Paul M. B.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (03) : 370 - 383
  • [10] A Fast Quartet tree heuristic for hierarchical clustering
    Cilibrasi, Rudi L.
    Vitanyi, Paul M. B.
    [J]. PATTERN RECOGNITION, 2011, 44 (03) : 662 - 677