Efficient Uncertain Sequence Pattern Mining Based on Hadoop Platform

被引:4
作者
Wu, Jimmy Ming-Tai [1 ]
Liu, Shuo [1 ]
Lin, Jerry Chun-Wei [2 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao 266590, Shandong, Peoples R China
[2] Western Norway Univ Appl Sci, Dept Comp Sci Elect Engn & Math Sci, N-5063 Bergen, Norway
关键词
Data mining; big data; sequential patterns; uncertain; Hadoop; DISCOVERY;
D O I
10.1142/S0218126622502619
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the Internet of Things (IoT) era, information is collected by sensor devices, resulting in data loss or uncertain data and other consequences. We need to represent the uncertain data collected using probabilities to extract the useful information for production and application from a huge indeterminate data warehouse. The data in the database has a particular order in time or space, so the High-Utility Probability Sequential Pattern Mining (HUPSPM) has become a new investigation and analysis topic in data processing. After the progress of timestamp, many efficient algorithms for sequential mining have been developed. However, these algorithms have a limitation: they can only be executed in a stand-alone environment and are only suitable for small datasets. Therefore, introducing an advanced graph framework for processing large datasets addresses the shortcomings of the existing methods. The proposed algorithm can avoid repeated database searching, splitting the database, and improve the parallel computing capability. The initial database is pruned according to the existing pruning strategy to effectively reduce the number of candidate sets effectively. Experiments show that the algorithm presented in this paper has excellent advantages in mining high-utility probability sequences in large datasets.
引用
收藏
页数:15
相关论文
共 41 条
[1]   A Survey of Uncertain Data Algorithms and Applications [J].
Aggarwal, Charu C. ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :609-623
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]   DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].
AGRAWAL, R ;
IMIELINSKI, T ;
SWAMI, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925
[4]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[5]  
Agrawal R., 1994, PROC 20 INT C VERY L, V1215, P487, DOI DOI 10.5555/645920.672836
[6]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[7]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[8]   The SPMF Open-Source Data Mining Library Version 2 [J].
Fournier-Viger, Philippe ;
Lin, Jerry Chun-Wei ;
Gomariz, Antonio ;
Gueniche, Ted ;
Soltani, Azadeh ;
Deng, Zhihong ;
Hoang Thanh Lam .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 :36-40
[9]  
Fournier-Viger P, 2014, LECT NOTES COMPUT SC, V8436, P83, DOI 10.1007/978-3-319-06483-3_8
[10]  
Gan W. C., ARXIV