Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data

被引:3
作者
He, Zhenwen [1 ]
Zhang, Chunfeng [1 ]
Ma, Xiaogang [2 ]
Liu, Gang [1 ]
机构
[1] China Univ Geosci Wuhan, Sch Comp Sci, 388 Lumo Rd, Wuhan 430074, Peoples R China
[2] Univ Idaho, Dept Comp Sci, 875 Perimeter Dr,MS 1010, Moscow, ID 83844 USA
基金
中国国家自然科学基金;
关键词
time series; SAX; PAA; HAX; PAX; DIMENSIONALITY REDUCTION; SYMBOLIC REPRESENTATION; SIMILARITY MEASURES; DISTANCE; SEARCH; SETS; SAX;
D O I
10.3390/a14120353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.
引用
收藏
页数:23
相关论文
共 69 条
  • [1] A review on distance based time series classification
    Abanda, Amaia
    Mori, Usue
    Lozano, Jose A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (02) : 378 - 412
  • [2] Time-series clustering - A decade review
    Aghabozorgi, Saeed
    Shirkhorshidi, Ali Seyed
    Teh Ying Wah
    [J]. INFORMATION SYSTEMS, 2015, 53 : 16 - 38
  • [3] Agrawal R., 1993, Foundations of Data Organization and Algorithms. 4th International Conference. FODO '93 Proceedings, P69
  • [4] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
    ALTMAN, NS
    [J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
  • [5] [Anonymous], 2001, EUR C PRINC DAT MIN, DOI [10.1007/3-540-44794-610, DOI 10.1007/3-540-44794-610]
  • [6] [Anonymous], 2000, Vldb
  • [7] [Anonymous], 2001, INT JOINT C ART INT
  • [8] [Anonymous], 2007, Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB'07
  • [9] A soft computing framework for classifying time series based on fuzzy sets of events
    Ares, Juan
    Lara, Juan A.
    Lizcano, David
    Suarez, Sonia
    [J]. INFORMATION SCIENCES, 2016, 330 : 125 - 144
  • [10] Analysing time series structure with hidden Markov models
    Azzouzi, M
    Nabney, IT
    [J]. NEURAL NETWORKS FOR SIGNAL PROCESSING VIII, 1998, : 402 - 408