Parallel Discord Discovery

被引：38

作者：

Huang, Tian ^{[1
]}

Zhu, Yongxin ^{[1
]}

Mao, Yishu ^{[1
]}

Li, Xinyang ^{[1
]}

Liu, Mengyun ^{[1
]}

Wu, Yafei ^{[1
]}

Ha, Yajun ^{[2
]}

Dobbie, Gillian ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Microelect, Shanghai, Peoples R China

[2] ASTAR, Inst Infocomm Res, Singapore, Singapore

[3] Univ Auckland, Dept Comp Sci, Auckland, New Zealand

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II | 2016年 / 9652卷

关键词：

Time series discord; Parallel; Large scale; In-memory computing; TIME-SERIES; SEARCH;

D O I：

10.1007/978-3-319-31750-2_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discords are the most unusual subsequences of a time series. Sequential discovery of discords is time consuming. As the scale of datasets increases unceasingly, datasets have to be kept on hard disk, which degrades the utilization of computing resources. Furthermore, the results discovered from segmentations of a time series are non-combinable, which makes discord discovery hard to parallelize. In this paper, we propose Parallel Discord Discovery (PDD), which divides the discord discovery problem in a combinable manner and solves its sub-problems in parallel. PDD accelerates discord discovery with multiple computing nodes and guarantees the correctness of the results. PDD stores large time series in distributed memory and takes advantage of in-memory computing to improve the utilization of computing resources. Experiments show that given 10 computing nodes, PDD is seven times faster than the sequential method HOTSAX. PDD is able to handle larger datasets than HOTSAX does. PDD achieves over 90% utilization of computing resources, nearly twice as much as the disk-aware method does.

引用

页码：233 / 244

页数：12

共 20 条

[1] Ameen J., 2007, 2 INT C INN COMP INF, P177
[2] [Anonymous], 2005, P 5 IEEE INT C DAT M
[3] [Anonymous], 2003, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '03, DOI DOI 10.1145/956750.956808
[4] Apache, 2014, SPARK CONF
[5] Basha R, 2007, INT J INNOV COMPUT I, V3, P471
[6] Bu YY, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P449
[7] Buu H. T. Q., 2011, 2011 3 INT C KNOWL S, P11
[8] Camerra A., 2010, 2010 IEEE 10 INT C D, P58
[9] Fu AWC, 2006, LECT NOTES ARTIF INT, V4093, P31
[10] A review on time series data mining
Fu, Tak-chung
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (01) : 164 - 181

← 1 2 →