Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data

被引:39
作者
Leung, Carson Kai-Sang [1 ]
MacKinnon, Richard Kyle [1 ]
Jiang, Fan [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada
来源
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) | 2014年
关键词
Big data models and algorithms; algorithms and programming techniques for Big data processing; Big data analytics; Big data search and mining; frequent patterns; constraints; uncertain data; PARALLEL;
D O I
10.1109/BigData.Congress.2014.53
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns-out of which only some are interesting. In this paper, we propose an algorithm that (i) allows users to express their interest in terms of constraints and (ii) uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified constraints. By exploiting properties of the constraints, our algorithm greatly reduces the search space for Big data mining of uncertain data, and returns only those patterns that are interesting to the users for Big data analytics.
引用
收藏
页码:315 / 322
页数:8
相关论文
共 25 条
  • [1] Approximate Incremental Big-Data Harmonization
    Agarwal, Puneet
    Shroff, Gautam
    Malhotra, Pankaj
    [J]. 2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 118 - 125
  • [2] Agrawal R., P 20 INT C VERY LARG
  • [3] Consistent Process Mining Over Big Data Triple Stores
    Azzini, Antonia
    Ceravolo, Paolo
    [J]. 2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 54 - 61
  • [4] Condie T., 2013, SIGMOD '13 Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, P939, DOI DOI 10.1145/2463676.2465338
  • [5] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [6] Ferreira Cordeiro RobsonLeonardo., 2011, P 17 ACM SPECIAL INT, P690, DOI DOI 10.1145/2020408.2020516
  • [7] Fast Parallel Outlier Detection for Categorical Datasets using MapReduce
    Koufakou, Anna
    Secretan, Jimmy
    Reeder, John
    Cardona, Kelvin
    Georgiopoulos, Michael
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3298 - 3304
  • [8] Hazy: Making It Easier to Build and Maintain Big-Data Analytics
    Kumar, Arun
    Niu, Feng
    Re, Christopher
    [J]. COMMUNICATIONS OF THE ACM, 2013, 56 (03) : 40 - 49
  • [9] Efficient dynamic mining of constrained frequent sets
    Lakshmanan, LVS
    Leung, CKS
    Ng, RT
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2003, 28 (04): : 337 - 389
  • [10] Leung C.K., 2009, ENCY DATABASE SYSTEM, P1179