Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data

被引:14
|
作者
Seref, Onur [1 ]
Fan, Ya-Ju [2 ]
Chaovalitwongse, Wanpracha Art [3 ,4 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Business Informat Technol, Blacksburg, VA 24061 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94551 USA
[3] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
[4] Univ Washington, Dept Radiol, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
clustering; optimization; discrete k-median; mixed-integer programming; uncoupled bilinear program algorithm; sequential optimization; time series; HUB LOCATION;
D O I
10.1287/ijoc.2013.0554
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Discrete k-median (DKM) clustering problems arise in many real-life applications that involve time-series data sets, in which nondiscrete clustering methods may not represent the problem domain adequately. In this study, we propose mathematical programming formulations and solution methods to efficiently solve the DKM clustering problem. We develop approximation algorithms from a bilinear formulation of the discrete k-median problem using an uncoupled bilinear program algorithm. This approximation algorithm, which we refer to as DKM-L, is composed of two alternating linear programs, where one can be solved in linear time and the other is a minimum cost assignment problem. We then modify this algorithm by replacing the assignment problem with an efficient sequential algorithm for a faster approximation, which we call DKM-S. We also propose a compact exact integer formulation, DKM-I, and a more efficient network design-based exact mixed-integer formulation, DKM-M. All of our methods use arbitrary pairwise distance matrices as input. We apply our methods to simulated single-variate and multivariate random walk time-series data. We report comparative clustering performances using normalized mutual information (NMI) and solution speeds among the DKM methods we propose. We also compare our methods to other clustering algorithms that can operate with distance matrices, such as hierarchical cluster trees (HCT) and partition around medoids (PAM). We present NMI scores and classification accuracies of our DKM algorithms compared to HCT and PAM using five different distance measures on simluated data, as well as public benchmark and real-life neural time-series data sets. We show that DKM-S is much faster than HCT, PAM, and all other DKM methods and produces consistently good clustering results on all data sets.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 41 条
  • [31] Microarray Time-Series Data Clustering Using Rough-Fuzzy C-Means Algorithm
    Maji, Pradipta
    Paul, Sushmita
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 269 - 272
  • [32] A clustering algorithm for detecting differential deviations in the multivariate time-series IoT data based on sensor relationship
    Idrees, Rabbia
    Maiti, Ananda
    Garg, Saurabh
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2641 - 2690
  • [33] Time-series anomaly detection using dynamic programming based longest common subsequence on sensor data
    Nguyen, Thi Phuong Quyen
    Phuc, Phan Nguyen Ky
    Yang, Chao -Lung
    Sutrisno, Hendri
    Luong, Bao-Han
    Le, Thi Huynh Anh
    Nguyen, Thanh Tung
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [34] Based on Correlation Analysis and K-Means: An Anomaly Detection Algorithm for Seasonal Time-Series Data
    Wang, Xin
    Yang, Yingxue
    Ding, Xueshuang
    Zhao, Yantao
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (06) : 978 - 986
  • [35] Motif-Based Method for Initialization the K-Means Clustering for Time Series Data
    Le Phu
    Duong Tuan Anh
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 11 - 20
  • [36] A New Model for Learning-based forecasting Procedure by Combining k-means Clustering and Time Series forecasting Algorithms
    Hartomo K.D.
    Nataliani Y.
    PeerJ Computer Science, 2021, 7 : 1 - 29
  • [37] A self-supervised learning-based approach to clustering multivariate time-series data with missing values (SLAC-Time): An application to TBI phenotyping
    Ghaderi, Hamid
    Foreman, Brandon
    Nayebi, Amin
    Tipirneni, Sindhu
    Reddy, Chandan K.
    Subbian, Vignesh
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143
  • [38] TSPol-ASLIC: Adaptive Superpixel Generation With Local Iterative Clustering for Time-Series Quad- and Dual-Polarization SAR Data
    Gao, Han
    Wang, Changcheng
    Xiang, Deliang
    Ye, Jiawei
    Wang, Guanya
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [39] Classification of Rice Heavy Metal Stress Levels Based on Phenological Characteristics Using Remote Sensing Time-Series Images and Data Mining Algorithms
    Liu, Tianjiao
    Liu, Xiangnan
    Liu, Meiling
    Wu, Ling
    SENSORS, 2018, 18 (12)
  • [40] Winter wheat yield prediction using integrated Landsat 8 and Sentinel-2 vegetation index time-series data and machine learning algorithms
    Zhang, Haiyang
    Zhang, Yao
    Liu, Kaidi
    Lan, Shu
    Gao, Tinyao
    Li, Minzan
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 213