Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data

被引：39

作者：

Leung, Carson Kai-Sang ^{[1
]}

MacKinnon, Richard Kyle ^{[1
]}

Jiang, Fan ^{[1
]}

机构：

[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada

来源：

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) | 2014年

关键词：

Big data models and algorithms; algorithms and programming techniques for Big data processing; Big data analytics; Big data search and mining; frequent patterns; constraints; uncertain data; PARALLEL;

D O I：

10.1109/BigData.Congress.2014.53

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns-out of which only some are interesting. In this paper, we propose an algorithm that (i) allows users to express their interest in terms of constraints and (ii) uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified constraints. By exploiting properties of the constraints, our algorithm greatly reduces the search space for Big data mining of uncertain data, and returns only those patterns that are interesting to the users for Big data analytics.

引用

页码：315 / 322

页数：8

共 25 条

[1] Approximate Incremental Big-Data Harmonization [J].

Agarwal, Puneet ;

Shroff, Gautam ;

Malhotra, Pankaj .

2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, :118-125

[2]

Agrawal R., P 20 INT C VERY LARG

[3] Consistent Process Mining Over Big Data Triple Stores [J].

Azzini, Antonia ;

Ceravolo, Paolo .

2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, :54-61

[4]

Condie T., 2013, SIGMOD '13 Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, P939, DOI DOI 10.1145/2463676.2465338

[5]

Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137

[6]

Ferreira Cordeiro RobsonLeonardo., 2011, P 17 ACM SPECIAL INT, P690, DOI DOI 10.1145/2020408.2020516

[7] Fast Parallel Outlier Detection for Categorical Datasets using MapReduce [J].

Koufakou, Anna ;

Secretan, Jimmy ;

Reeder, John ;

Cardona, Kelvin ;

Georgiopoulos, Michael .

2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :3298-3304

[8] Hazy: Making It Easier to Build and Maintain Big-Data Analytics [J].

Kumar, Arun ;

Niu, Feng ;

Re, Christopher .

COMMUNICATIONS OF THE ACM, 2013, 56 (03) :40-49

[9] Efficient dynamic mining of constrained frequent sets [J].

Lakshmanan, LVS ;

Leung, CKS ;

Ng, RT .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 2003, 28 (04) :337-389

[10]

Leung C.K., 2009, ENCY DATABASE SYSTEM, P1179

← 1 2 3 →