Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data

被引:14
|
作者
Seref, Onur [1 ]
Fan, Ya-Ju [2 ]
Chaovalitwongse, Wanpracha Art [3 ,4 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Business Informat Technol, Blacksburg, VA 24061 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94551 USA
[3] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
[4] Univ Washington, Dept Radiol, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
clustering; optimization; discrete k-median; mixed-integer programming; uncoupled bilinear program algorithm; sequential optimization; time series; HUB LOCATION;
D O I
10.1287/ijoc.2013.0554
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Discrete k-median (DKM) clustering problems arise in many real-life applications that involve time-series data sets, in which nondiscrete clustering methods may not represent the problem domain adequately. In this study, we propose mathematical programming formulations and solution methods to efficiently solve the DKM clustering problem. We develop approximation algorithms from a bilinear formulation of the discrete k-median problem using an uncoupled bilinear program algorithm. This approximation algorithm, which we refer to as DKM-L, is composed of two alternating linear programs, where one can be solved in linear time and the other is a minimum cost assignment problem. We then modify this algorithm by replacing the assignment problem with an efficient sequential algorithm for a faster approximation, which we call DKM-S. We also propose a compact exact integer formulation, DKM-I, and a more efficient network design-based exact mixed-integer formulation, DKM-M. All of our methods use arbitrary pairwise distance matrices as input. We apply our methods to simulated single-variate and multivariate random walk time-series data. We report comparative clustering performances using normalized mutual information (NMI) and solution speeds among the DKM methods we propose. We also compare our methods to other clustering algorithms that can operate with distance matrices, such as hierarchical cluster trees (HCT) and partition around medoids (PAM). We present NMI scores and classification accuracies of our DKM algorithms compared to HCT and PAM using five different distance measures on simluated data, as well as public benchmark and real-life neural time-series data sets. We show that DKM-S is much faster than HCT, PAM, and all other DKM methods and produces consistently good clustering results on all data sets.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 41 条
  • [21] Evaluating multivariate time-series clustering using simulated ecological momentary assessment data
    Ntekouli, Mandani
    Spanakis, Gerasimos
    Waldorp, Lourens
    Roefs, Anne
    MACHINE LEARNING WITH APPLICATIONS, 2023, 14
  • [22] Clustering Microarray Time-series Data using Expectation Maximization and Multiple Profile Alignment
    Subhani, Numanul
    Rueda, Luis
    Ngom, Alioune
    Burden, Conrad J.
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 1 - +
  • [23] Fourier Magnitude-Based Privacy-Preserving Clustering on Time-Series Data
    Kim, Hea-Suk
    Moon, Yang-Sae
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (06): : 1648 - 1651
  • [24] A new clustering method using wavelet based probability density functions for identifying patterns in time-series data
    Kordestani, Mojtaba
    Alkhateeb, Abedalrhman
    Rezaeian, Iman
    Rueda, Luis
    Saif, Mehrdad
    2016 IEEE EMBS INTERNATIONAL STUDENT CONFERENCE (ISC), 2016,
  • [25] Ant Based Clustering of Time Series Discrete Data - A Rough Set Approach
    Pancerz, Krzysztof
    Lewicki, Arkadiusz
    Tadeusiewicz, Ryszard
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, PT I, 2011, 7076 : 645 - +
  • [26] Time series k-means: A new k-means type smooth subspace clustering for time series data
    Huang, Xiaohui
    Ye, Yunming
    Xiong, Liyan
    Lau, Raymond Y. K.
    Jiang, Nan
    Wang, Shaokai
    INFORMATION SCIENCES, 2016, 367 : 1 - 13
  • [27] Model-based cell clustering and population tracking for time-series flow cytometry data
    Kodai Minoura
    Ko Abe
    Yuka Maeda
    Hiroyoshi Nishikawa
    Teppei Shimamura
    BMC Bioinformatics, 20
  • [28] Model-based cell clustering and population tracking for time-series flow cytometry data
    Minoura, Kodai
    Abe, Ko
    Maeda, Yuka
    Nishikawa, Hiroyoshi
    Shimamura, Teppei
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [29] Hierarchical and nonhierarchical clustering of sequential data with Hidden Markov Models and its application to financial time-series data
    Shi Zhan
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INNOVATION & MANAGEMENT, VOLS I AND II, 2007, : 920 - 933
  • [30] Time-Series Data Classification and Analysis Associated With Machine Learning Algorithms for Cognitive Perception and Phenomenon
    Jeong, Taikyeong
    IEEE ACCESS, 2020, 8 : 222417 - 222428