Finding Probabilistic Prevalent Colocations in Spatially Uncertain Data Sets

被引：50

作者：

Wang, Lizhen ^{[1
]}

Wu, Pinping ^{[1
]}

Chen, Hongmei ^{[1
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Dept Comp Sci & Engn, Kunming 650091, Yunnan Province, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2013年 / 25卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Spatial colocations; spatially uncertain data set; possible worlds; probabilistic prevalent colocations (PPCs); dynamic programming; approximate algorithms; MINING FREQUENT ITEMSETS; ASSOCIATION RULES; DISCOVERY; PATTERNS;

D O I：

10.1109/TKDE.2011.256

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A spatial colocation pattern is a group of spatial features whose instances are frequently located together in geographic space. Discovering colocations has many useful applications. For example, colocated plant species discovered from plant distribution data sets can contribute to the analysis of plant geography, phytosociology studies, and plant protection recommendations. In this paper, we study the colocation mining problem in the context of uncertain data, as the data generated from a wide range of data sources are inherently uncertain. One straightforward method to mine the prevalent colocations in a spatially uncertain data set is to simply compute the expected participation index of a candidate and decide if it exceeds a minimum prevalence threshold. Although this definition has been widely adopted, it misses important information about the confidence which can be associated with the participation index of a colocation. We propose another definition, probabilistic prevalent colocations, trying to find all the colocations that are likely to be prevalent in a randomly generated possible world. Finding probabilistic prevalent colocations (PPCs) turn out to be difficult. First, we propose pruning strategies for candidates to reduce the amount of computation of the probabilistic participation index values. Next, we design an improved dynamic programming algorithm for identifying candidates. This algorithm is suitable for parallel computation, and approximate computation. Finally, the effectiveness and efficiency of the methods proposed as well as the pruning strategies and the optimization techniques are verified by extensive experiments with "real + synthetic" spatially uncertain data sets.

引用

页码：790 / 804

页数：15

共 31 条

[1]

Aggarwal CC, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P29

[2] A Survey of Uncertain Data Algorithms and Applications [J].

Aggarwal, Charu C. ;

Yu, Philip S. .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :609-623

[3]

Agrawal P., 2006, VLDB, P1151

[4]

Agrawal R., P 20 INT C VERY LARG

[5]

[Anonymous], 2010, COMPUTER INFORM SCI, DOI DOI 10.5539/CIS.V3N2P171

[6]

Bernecker T, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P119

[7]

Chui CK, 2008, LECT NOTES ARTIF INT, V5012, P64, DOI 10.1007/978-3-540-68125-0_8

[8]

Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47

[9]

Ester M, 1999, LECT NOTES ARTIF INT, V1701, P61

[10]

Green T., 2006, DATA ENG B, V29, P25

← 1 2 3 4 →