Correlation analysis techniques for uncertain time series

被引:2
作者
Orang, Mahsa [1 ,1 ]
Shiri, Nematollaah [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Correlation analysis; Probabilistic queries; Query optimization; Query processing; Uncertain data; SEARCH;
D O I
10.1007/s10115-016-0939-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many applications such as location-based services and wireless sensor networks generate and deal with uncertain time series (UTS), where the "exact" value at each timestamp is unknown. Traditional correlation analysis and search techniques developed for standard time series are inadequate for UTS data analysis required in such applications. Motivated by this need, we propose suitable concepts and techniques for UTS correlation analysis. We formalize the notion of normalization and correlation for UTS in two general settings based on the available information at each timestamp: (1) PDF-based UTS (having probability density function) and (2) multiset-based UTS (having multiset of observed values). For each case, we formulate correlation as a random variable and develop techniques to determine the underlying probability density function. For setup (2), we also present probabilistic pruning and sampling techniques to speed up the search process. We conducted numerous experiments to evaluate the performance of the proposed techniques under different configurations using the UCR benchmark datasets. Our results indicate effectiveness of the proposed techniques. For setup (2), in particular, our results show significant improvement in space utilization and computation time. We believe the proposed ideas and solutions lend themselves to powerful tools for UTS analysis and search tasks.
引用
收藏
页码:79 / 116
页数:38
相关论文
共 34 条
[1]  
[Anonymous], 2005, P 31 INT C VERY LARG
[2]  
[Anonymous], 2003, Proc. of the ACM SIGMOD International Conference on Management of Data, DOI DOI 10.1145/872757
[3]  
Assfalg J, 2009, LECT NOTES COMPUT SC, V5566, P435, DOI 10.1007/978-3-642-02279-1_31
[4]   A bit level representation for time series data mining with shape based similarity [J].
Bagnall, Anthony ;
Ratanamahatana, Chotirat 'Ann' ;
Keogh, Eamonn ;
Lonardi, Stefano ;
Janacek, Gareth .
DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 13 (01) :11-40
[5]  
Bernecker T, 2009, P WORKSH MAN DAT QUA
[6]  
Bohm C, 2006, P INT C DAT ENG ICDE
[7]   Querying imprecise data in moving object environments [J].
Cheng, R ;
Kalashnikov, DV ;
Prabhakar, S .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1112-1127
[8]  
Cheng Reynold, 2006, Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM '06, P738
[9]   Top-k Nearest Neighbor Search In Uncertain Data Series [J].
Dallachiesa, Michele ;
Palpanas, Themis ;
Ilyas, Ihab F. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (01) :13-24
[10]   Sliding windows over uncertain data streams [J].
Dallachiesa, Michele ;
Jacques-Silva, Gabriela ;
Gedik, Bugra ;
Wu, Kun-Lung ;
Palpanas, Themis .
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (01) :159-190