Distributed Sequential Pattern Mining in Large Scale Uncertain Databases

被引:11
|
作者
Ge, Jiaqi [1 ]
Xia, Yuni [1 ]
机构
[1] Indiana Univ Purdue Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA
关键词
Uncertain databases; Sequential pattern mining; Distributed computing;
D O I
10.1007/978-3-319-31750-2_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While sequential pattern mining (SPM) is an import application in uncertain databases, it is challenging in efficiency and scalability. In this paper, we develop a dynamic programming (DP) approach to mine probabilistic frequent sequential patterns in distributed computing platform Spark. Directly applying the DP method to Spark is impractical because its memory-consuming characteristic may cause heavy JVM garbage collection overhead in Spark. Therefore, we design a memory-efficient distributed DP approach and use an extended prefix-tree to save intermediate results efficiently. The extensive experimental results in various scales prove that our method is orders of magnitude faster than straight-forward approaches.
引用
收藏
页码:17 / 29
页数:13
相关论文
共 50 条
  • [1] Large-Scale Sequential Utility Pattern Mining in Uncertain Environments
    Wu, Jimmy Ming-Tai
    Liu, Shuo
    Lin, Jerry Chun-Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 553 - 559
  • [2] Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases
    Ge, Jiaqi
    Xia, Yuni
    Wang, Jian
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 268 - 279
  • [3] Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases
    Zhao, Zhou
    Yan, Da
    Ng, Wilfred
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1171 - 1184
  • [4] Parallel and Distributed Frequent Pattern Mining in Large Databases
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 407 - 414
  • [5] WSpan: Weighted sequential pattern mining in large sequence databases
    Yun, Unil
    Leggett, John J.
    2006 3rd International IEEE Conference Intelligent Systems, Vols 1 and 2, 2006, : 503 - 508
  • [6] High average-utility sequential pattern mining based on uncertain databases
    Jerry Chun-Wei Lin
    Ting Li
    Matin Pirouz
    Ji Zhang
    Philippe Fournier-Viger
    Knowledge and Information Systems, 2020, 62 : 1199 - 1228
  • [7] High average-utility sequential pattern mining based on uncertain databases
    Lin, Jerry Chun-Wei
    Li, Ting
    Pirouz, Matin
    Zhang, Ji
    Fournier-Viger, Philippe
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (03) : 1199 - 1228
  • [8] Parallel and Distributed Algorithms for Frequent Pattern Mining in Large Databases
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    IETE TECHNICAL REVIEW, 2009, 26 (01) : 55 - 66
  • [9] Distributed Algorithm for Sequential Pattern Mining on a Large Sequence Dataset
    Tho Hoang
    Bac Le
    Minh-Thai Tran
    2017 9TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2017), 2017, : 18 - 23
  • [10] Mining weighted sequential patterns in incremental uncertain databases
    Roy, Kashob Kumar
    Moon, Md Hasibul Haque
    Rahman, Md Mahmudur
    Ahmed, Chowdhury Farhan
    Leung, Carson Kai-Sang
    INFORMATION SCIENCES, 2022, 582 : 865 - 896