A Novel Lazy Time Series Classification Algorithm Based on the Shapelets

被引:0
作者
Wang Z.-H. [1 ]
Zhang W. [1 ]
Yuan J.-D. [1 ]
Liu H.-Y. [1 ]
机构
[1] School of Computer and Information Technology, Beijing Jiaotong University, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2019年 / 42卷 / 01期
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification; Interpretability; Lazy learning; Shapelets; Time series;
D O I
10.11897/SP.J.1016.2019.00029
中图分类号
学科分类号
摘要
In order to discover the characteristics of data and explain the prediction process of classification model, the study of interpretable model has become increasingly prevalent in recent years. In reality, we can get massive time series data in many fields, such as weather forecast, medical monitoring, and anomaly detection. Time series classification is an important research field of time series data mining. Time series is different from the traditional attribute vector data, and it has no explicit attributes. Even with the sophisticated feature selection techniques, the dimensionality of potential feature space is still beyond the acceptable range. This poses a challenge to learn an accurate classification model with strong interpretability. Since shapelet is a new primitive that can be used to construct interpretable model, time series classification based on shapelet has recently attracted considerable interest. Shapelet-based classification algorithm is a typical shape-based algorithm. Shapelet can help us give a high sight on the local discriminative features of time series. According to the usage of shapelet, the shapelet-based models can be divided into two categories. One type method establishes a much smaller yet more discriminative feature set through the top-k shapelets to transform the origin dataset. Furthermore, traditional classification algorithms can be applied on the converted low-dimensional dataset. The other employs selected shapelets to build the classification model directly. However, these global shapelet-based models have some obvious shortcomings. First, the global model always needs to create a candidate shapelet set which contains a large number of redundant elements in the process of extracting the best shapelet. Due to the impact of redundant instances and intra-class variation, the extracted shapelets are merely good for the training instances in the average sense. The established shapelet-based model may not be suitable and efficient for the test cases. Second, the shapelets obtained may be from different instances or approximate solutions, which cannot indicate the local characteristics of the test case exactly. Third, since the class value of the local features from the test case is unknown, the characteristics of test cases are always ignored. In order to solve the above problems, a data driven local model based on shapelets for each test case is proposed. In our model, instead of global similarity, local similarity is considered as the basis for classification. The local features of the test case are evaluated directly to find the most discriminative shapelet. And then the shapelet is used to reduce the searching space of class attribute value progressively. Since the shapelets are from the test example, they directly reflect the salient local features of the test case and can answer the question why the model assigns a certain class value to the instance. Meanwhile, in the shapelet evaluation progress, instances are selected to reduce the impact of redundant instances and intra-class variation. The lazy classification model presented in this paper is compared with two shapelet decision tree models, 1NN models based on different distance functions, and C4.5 models based on different top-k shapelets transformation algorithms. Experimental results show that the proposed model has higher accuracy and stronger interpretability. © 2019, Science Press. All right reserved.
引用
收藏
页码:29 / 43
页数:14
相关论文
共 27 条
  • [11] Ye L., Keogh E., Time series shapelets: A new primitive for data mining, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947-956, (2009)
  • [12] Lines J., Davis L.M., Hills J., Et al., A shapelet transform for time series classification, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 289-297, (2012)
  • [13] Hills J., Lines J., Baranauskas E., Et al., Classification of time series by shapelet transformation, Data Mining and Knowledge Discovery, 28, 4, pp. 851-881, (2014)
  • [14] Yuan J.D., Wang Z.H., Han M., A discriminative shapelets transformation for time series classification, International Journal of Pattern Recognition and Artificial Intelligence, 28, 6, pp. 1-28, (2014)
  • [15] Yuan J.-D., Wang Z.-H., Han M., Shapelet pruning and shapelet coverage for time series classification, Journal of Software, 26, 9, pp. 2311-2325, (2015)
  • [16] Yuan J.-D., Wang Z.-H., Han M., Et al., A logical Shapelets transformation for time series classification, Chinese Journal of Computers, 38, 7, pp. 1448-1459, (2015)
  • [17] Zalewski W., Silva F., Maletzke A.G., Et al., Exploring shapelet transformation for time series classification in decision trees, Knowledge-Based Systems, 112, pp. 80-91, (2016)
  • [18] Mueen A., Keogh E., Young N., Logical-shapelets: An expressive primitive for time series classification, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1154-1162, (2011)
  • [19] Rakthanmanon T., Keogh E., Fast shapelets: A scalable algorithm for discovering time series shapelets, Proceedings of the 13th SIAM International Conference on Data Mining, pp. 668-676, (2013)
  • [20] Grabocka J., Schilling N., Wistuba M., Et al., Learning time-series shapelets, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 392-401, (2014)