Mining lake time series using symbolic representation

被引:6
|
作者
Ruan, Guangchen [1 ]
Hanson, Paul C. [2 ]
Dugan, Hilary A. [2 ]
Plale, Beth [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, 919 E 10th St, Bloomington, IN 47408 USA
[2] Univ Wisconsin, Ctr Limnol, 680 North Pk St, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
Lake time series; Symbolic representation; Mining; EVOLUTION; MODEL;
D O I
10.1016/j.ecoinf.2017.03.001
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Sensor networks deployed in lakes and reservoirs, when combined with simulation models and expert knowledge from the global community, are creating deeper understanding of the ecological dynamics of lakes. However, the amount of data and the complex patterns in the data demand substantial compute resources and efficient data mining algorithms, both of which are beyond the realm of traditional limnological research. This paper uniquely adapts methods from computer science for application to data intensive ecological questions, in order to provide ecologists with approachable methodology to facilitate knowledge discovery in lake ecology. We apply a state-of-the-art time series mining technique based on symbolic representation (SAX) to high-frequency time series of phycocyanin (PHYCO) and chlorophyll (CHLORO) fluorescence, both of which are indicators of algal biomass in lakes, as well as model predictions of algal biomass (MODEL). We use data mining techniques to demonstrate that MODEL predicts PHYCO better than it predicts CHLORO. All time series have high redundancy, resulting in a relatively small subset of unique patterns. However, MODEL is much less complex than either PHYCO or CHLORO and fails to reproduce high biomass periods indicative of algal blooms. We develop a set of tools in R to enable motif discovery and anomaly detection within a single lake time series, and relationship study among multiple lake time series through distance metrics, clustering and classification. Furthermore, to improve computation times, we provision web services to launch R tools remotely on high performance computing (HPC) resources. Comprehensive experimental results on observational and simulated lake data demonstrate the effectiveness of our approach. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 22
页数:13
相关论文
共 50 条
  • [1] Symbolic representation for time series
    Combettes, Sylvain W.
    Truong, Charles
    Oudre, Laurent
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 1962 - 1966
  • [2] An Enhanced Binary Symbolic Representation for Time Series Data Mining Based Similarity
    Sun, Meiyu
    Fang, Jianan
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 7130 - 7134
  • [3] ABBA-VSM: Time Series Classification Using Symbolic Representation on the Edge
    Kanatbekova, Meerzhan
    Ilager, Shashikant
    Brandic, Ivona
    SERVICE-ORIENTED COMPUTING, ICSOC 2024, PT I, 2025, 15404 : 38 - 53
  • [4] Symbolic Representation of Time Series: A Hierarchical Coclustering Formalization
    Bondu, Alexis
    Boulle, Marc
    Cornuejols, Antoine
    ADVANCED ANALYSIS AND LEARNING ON TEMPORAL DATA, AALTD 2015, 2016, 9785 : 3 - 16
  • [5] Experiencing SAX: a novel symbolic representation of time series
    Lin, Jessica
    Keogh, Eamonn
    Wei, Li
    Lonardi, Stefano
    DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 15 (02) : 107 - 144
  • [6] Experiencing SAX: a novel symbolic representation of time series
    Jessica Lin
    Eamonn Keogh
    Li Wei
    Stefano Lonardi
    Data Mining and Knowledge Discovery, 2007, 15 : 107 - 144
  • [7] Symbolic Time Series Representation for Stream Data Processing
    Sevcech, Jakub
    Bielikova, Maria
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 217 - 222
  • [8] TrSAX-An improved time series symbolic representation for classification
    Ruan, Hui
    Hu, Xiaoguang
    Xiao, Jin
    Zhang, Guofeng
    ISA TRANSACTIONS, 2020, 100 : 387 - 395
  • [9] A Symbolic Representation of Two -Dimensional Time Series for Arbitrary Length DTW Motif
    Imamura, Makoto
    Nakamura, Takaaki
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1067 - 1072
  • [10] Granulation-based symbolic representation of time series and semi-supervised classification
    Meng, Jun
    Wu, LiXia
    Wang, XiuKun
    Lin, TsauYoung
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2011, 62 (09) : 3581 - 3590