A Novel Lazy Time Series Classification Algorithm Based on the Shapelets

被引:0
作者
Wang Z.-H. [1 ]
Zhang W. [1 ]
Yuan J.-D. [1 ]
Liu H.-Y. [1 ]
机构
[1] School of Computer and Information Technology, Beijing Jiaotong University, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2019年 / 42卷 / 01期
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification; Interpretability; Lazy learning; Shapelets; Time series;
D O I
10.11897/SP.J.1016.2019.00029
中图分类号
学科分类号
摘要
In order to discover the characteristics of data and explain the prediction process of classification model, the study of interpretable model has become increasingly prevalent in recent years. In reality, we can get massive time series data in many fields, such as weather forecast, medical monitoring, and anomaly detection. Time series classification is an important research field of time series data mining. Time series is different from the traditional attribute vector data, and it has no explicit attributes. Even with the sophisticated feature selection techniques, the dimensionality of potential feature space is still beyond the acceptable range. This poses a challenge to learn an accurate classification model with strong interpretability. Since shapelet is a new primitive that can be used to construct interpretable model, time series classification based on shapelet has recently attracted considerable interest. Shapelet-based classification algorithm is a typical shape-based algorithm. Shapelet can help us give a high sight on the local discriminative features of time series. According to the usage of shapelet, the shapelet-based models can be divided into two categories. One type method establishes a much smaller yet more discriminative feature set through the top-k shapelets to transform the origin dataset. Furthermore, traditional classification algorithms can be applied on the converted low-dimensional dataset. The other employs selected shapelets to build the classification model directly. However, these global shapelet-based models have some obvious shortcomings. First, the global model always needs to create a candidate shapelet set which contains a large number of redundant elements in the process of extracting the best shapelet. Due to the impact of redundant instances and intra-class variation, the extracted shapelets are merely good for the training instances in the average sense. The established shapelet-based model may not be suitable and efficient for the test cases. Second, the shapelets obtained may be from different instances or approximate solutions, which cannot indicate the local characteristics of the test case exactly. Third, since the class value of the local features from the test case is unknown, the characteristics of test cases are always ignored. In order to solve the above problems, a data driven local model based on shapelets for each test case is proposed. In our model, instead of global similarity, local similarity is considered as the basis for classification. The local features of the test case are evaluated directly to find the most discriminative shapelet. And then the shapelet is used to reduce the searching space of class attribute value progressively. Since the shapelets are from the test example, they directly reflect the salient local features of the test case and can answer the question why the model assigns a certain class value to the instance. Meanwhile, in the shapelet evaluation progress, instances are selected to reduce the impact of redundant instances and intra-class variation. The lazy classification model presented in this paper is compared with two shapelet decision tree models, 1NN models based on different distance functions, and C4.5 models based on different top-k shapelets transformation algorithms. Experimental results show that the proposed model has higher accuracy and stronger interpretability. © 2019, Science Press. All right reserved.
引用
收藏
页码:29 / 43
页数:14
相关论文
共 27 条
  • [1] Mcgovern A., Rosendahl D.H., Brown R.A., Et al., Identifying predictive multi-dimensional time series motifs: An application to severe weather prediction, Data Mining and Knowledge Discovery, 22, 1-2, pp. 232-258, (2011)
  • [2] Patri O., Wojnowicz M., Wolff M., Discovering malware with time series shapelets, Proceedings of the 50th Hawaii International Conference on System Sciences, pp. 6079-6088, (2017)
  • [3] Zhu L., Lu C., Sun Y., Time series shapelet classification based online short-term voltage stability assessment, IEEE Transactions on Power Systems, 31, 2, pp. 1430-1439, (2016)
  • [4] Burkom H.S., Murphy S.P., Shmueli G., Automated time series forecasting for biosurveillance, Statistics in Medicine, 26, 22, pp. 4202-4218, (2007)
  • [5] Zhong S., Khoshgoftaar T.M., Seliya N., Clustering-based network intrusion detection, International Journal of Reliability, Quality and Safety Engineering, 14, 2, pp. 169-187, (2007)
  • [6] Xing Z., Pei J., Keogh E., A brief survey on sequence classification, ACM SIGKDD Explorations Newsletter, 12, 1, pp. 40-48, (2010)
  • [7] Ding H., Trajcevski G., Scheuermann P., Et al., Querying and mining of time series data: Experimental comparison of representations and distance measures, Proceedings of the VLDB Endowment, 1, 2, pp. 1542-1552, (2008)
  • [8] Keogh E., Kasetty S., On the need for time series data mining benchmarks: A survey and empirical demonstration, Data Mining and Knowledge Discovery, 7, 4, pp. 349-371, (2003)
  • [9] Antunes C.M., Oliveira A.L., Temporal data mining: An overview, Proceedings of the KDD Workshop on Temporal Data Mining, pp. 1-13, (2001)
  • [10] Bagnall A., Lines J., Bostrom A., Et al., The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances, Data Mining and Knowledge Discovery, 31, 3, pp. 606-660, (2017)