Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data

被引：3

作者：

He, Zhenwen ^{[1
]}

Zhang, Chunfeng ^{[1
]}

Ma, Xiaogang ^{[2
]}

Liu, Gang ^{[1
]}

机构：

[1] China Univ Geosci Wuhan, Sch Comp Sci, 388 Lumo Rd, Wuhan 430074, Peoples R China

[2] Univ Idaho, Dept Comp Sci, 875 Perimeter Dr,MS 1010, Moscow, ID 83844 USA

来源：

ALGORITHMS | 2021年 / 14卷 / 12期

基金：

中国国家自然科学基金;

关键词：

time series; SAX; PAA; HAX; PAX; DIMENSIONALITY REDUCTION; SYMBOLIC REPRESENTATION; SIMILARITY MEASURES; DISTANCE; SEARCH; SETS; SAX;

D O I：

10.3390/a14120353

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.

引用

页数：23

共 69 条

[1] A review on distance based time series classification
Abanda, Amaia
Mori, Usue
Lozano, Jose A.
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (02) : 378 - 412
[2] Time-series clustering - A decade review
Aghabozorgi, Saeed
Shirkhorshidi, Ali Seyed
Teh Ying Wah
[J]. INFORMATION SYSTEMS, 2015, 53 : 16 - 38
[3] Agrawal R., 1993, Foundations of Data Organization and Algorithms. 4th International Conference. FODO '93 Proceedings, P69
[4] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
ALTMAN, NS
[J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
[5] [Anonymous], 2001, EUR C PRINC DAT MIN, DOI [10.1007/3-540-44794-610, DOI 10.1007/3-540-44794-610]
[6] [Anonymous], 2000, Vldb
[7] [Anonymous], 2001, INT JOINT C ART INT
[8] [Anonymous], 2007, Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB'07
[9] A soft computing framework for classifying time series based on fuzzy sets of events
Ares, Juan
Lara, Juan A.
Lizcano, David
Suarez, Sonia
[J]. INFORMATION SCIENCES, 2016, 330 : 125 - 144
[10] Analysing time series structure with hidden Markov models
Azzouzi, M
Nabney, IT
[J]. NEURAL NETWORKS FOR SIGNAL PROCESSING VIII, 1998, : 402 - 408

← 1 2 3 4 5 6 7 →