Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data

被引：14

作者：

Seref, Onur ^{[1
]}

Fan, Ya-Ju ^{[2
]}

Chaovalitwongse, Wanpracha Art ^{[3
,4
]}

机构：

[1] Virginia Polytech Inst & State Univ, Dept Business Informat Technol, Blacksburg, VA 24061 USA

[2] Lawrence Livermore Natl Lab, Livermore, CA 94551 USA

[3] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA

[4] Univ Washington, Dept Radiol, Seattle, WA 98195 USA

来源：

INFORMS JOURNAL ON COMPUTING | 2014年 / 26卷 / 01期

基金：

美国国家科学基金会;

关键词：

clustering; optimization; discrete k-median; mixed-integer programming; uncoupled bilinear program algorithm; sequential optimization; time series; HUB LOCATION;

D O I：

10.1287/ijoc.2013.0554

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Discrete k-median (DKM) clustering problems arise in many real-life applications that involve time-series data sets, in which nondiscrete clustering methods may not represent the problem domain adequately. In this study, we propose mathematical programming formulations and solution methods to efficiently solve the DKM clustering problem. We develop approximation algorithms from a bilinear formulation of the discrete k-median problem using an uncoupled bilinear program algorithm. This approximation algorithm, which we refer to as DKM-L, is composed of two alternating linear programs, where one can be solved in linear time and the other is a minimum cost assignment problem. We then modify this algorithm by replacing the assignment problem with an efficient sequential algorithm for a faster approximation, which we call DKM-S. We also propose a compact exact integer formulation, DKM-I, and a more efficient network design-based exact mixed-integer formulation, DKM-M. All of our methods use arbitrary pairwise distance matrices as input. We apply our methods to simulated single-variate and multivariate random walk time-series data. We report comparative clustering performances using normalized mutual information (NMI) and solution speeds among the DKM methods we propose. We also compare our methods to other clustering algorithms that can operate with distance matrices, such as hierarchical cluster trees (HCT) and partition around medoids (PAM). We present NMI scores and classification accuracies of our DKM algorithms compared to HCT and PAM using five different distance measures on simluated data, as well as public benchmark and real-life neural time-series data sets. We show that DKM-S is much faster than HCT, PAM, and all other DKM methods and produces consistently good clustering results on all data sets.

引用

页码：160 / 172

页数：13

共 41 条

[1] Approximation Algorithms for Min-Sum k-Clustering and Balanced k-Median
Babak Behsaz
Zachary Friggstad
Mohammad R. Salavatipour
Rohit Sivakumar
Algorithmica, 2019, 81 : 1006 - 1030
[2] A k-Median Algorithm with Running Time Independent of Data Size
Adam Meyerson
Liadan O'Callaghan
Serge Plotkin
Machine Learning, 2004, 56 : 61 - 87
[3] A k-median algorithm with running time independent of data size
Meyerson, A
O'Callaghan, L
Plotkin, S
MACHINE LEARNING, 2004, 56 (1-3) : 61 - 87
[4] Clustering multivariate time-series data
Singhal, A
Seborg, DE
JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 427 - 438
[5] Time-Series Clustering for Data Analysis in Smart Grid
Maurya, Akanksha
Akyurek, Alper Sinan
Aksanli, Baris
Rosing, Tajana Simunic
2016 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2016,
[6] Clustering Time Series with k-Medoids Based Algorithms
Holder, Christopher
Guijo-Rubio, David
Bagnall, Anthony
ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2023, 2023, 14343 : 39 - 55
[7] Clustering Structure Analysis in Time-Series Data With Density-Based Clusterability Measure
Jokinen, Juho
Raty, Tomi
Lintonen, Timo
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (06) : 1332 - 1343
[8] Time Series Analysis of Oceanographic Data Using Clustering Algorithms
Kumar, D. J. Santosh
Vighneshwar, S. P.
Mishra, Tusar Kanti
Jampana, Satya, V
COMPUTER COMMUNICATION, NETWORKING AND INTERNET SECURITY, 2017, 5 : 245 - 252
[9] Approximate Clustering of Time-Series Datasets using k-Modes Partitioning
Aghabozorgi, Saeed
Teh Ying Wah
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2015, 31 (01) : 207 - 228
[10] Clustering Time-Series Gene Expression Data with Unequal Time Intervals
Rueda, Luis
Bari, Ataul
Ngom, Alioune
TRANSACTIONS ON COMPUTATIONAL SYSTEMS BIOLOGY X, 2008, 5410 : 100 - 123

← 1 2 3 4 5 →