Large-Scale High-Utility Sequential Pattern Analytics in Internet of Things

被引:30
作者
Srivastava, Gautam [1 ,2 ]
Lin, Jerry Chun-Wei [3 ]
Zhang, Xuyun [4 ]
Li, Yuanfa [5 ]
机构
[1] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[2] China Med Univ, Res Ctr Interneural Comp, Taichung 404, Taiwan
[3] Western Norway Univ Appl Sci, Dept Comp Sci Elect Engn & Math Sci, N-5063 Bergen, Norway
[4] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
[5] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
Data mining; Internet of Things; Databases; Sparks; Smart devices; Electronic mail; Big Data; Analytics; big data; data mining; edge computing; efficient computation; Internet of Things (IoT); sequential patterns;
D O I
10.1109/JIOT.2020.3026826
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern mining in high-utility sequential pattern mining (HUSPM), short form for high-utility sequential pattern mining. HUSPM takes into account the fusion of utility and sequence factors to assist in the determination of sequential patterns of high utility from databases and data sources. That being said, almost all current existing literature focus on only using a single machine to increase mining performance. In this work, we present a four-stage MapReduce framework that is solely based on the well-known Spark platform for use in HUSPM. This framework is shown to create a more efficient and faster mining performance for dealing with large data sets. It consists of four phases such as initialization, mining, updating, and generation phases to handle the big data sets based on the MapReduce framework running on the Spark platform. Experiments indicated that the designed model is capable of handling the very big data sets while state-of-the-art algorithms can only achieve good performance in small data sets.
引用
收藏
页码:12669 / 12678
页数:10
相关论文
共 40 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Agrawal R., 1994, 20 INT C VER LARG DA, P487, DOI DOI 10.1007/BF02948845
[3]  
Ahmed Chowdhury Farhan, 2010, Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2010), P76, DOI 10.1109/SNPD.2010.21
[4]   The role of big data analytics in Internet of Things [J].
Ahmed, Ejaz ;
Yaqoob, Ibrar ;
Hashem, Ibrahim Abaker Targio ;
Khan, Imran ;
Ahmed, Abdelmuttlib Ibrahim Abdalla ;
Imran, Muhammad ;
Vasilakos, Athanasios V. .
COMPUTER NETWORKS, 2017, 129 :459-471
[5]   An Evolutionary Model to Mine High Expected Utility Patterns From Uncertain Databases [J].
Ahmed, Usman ;
Lin, Jerry Chun-Wei ;
Srivastava, Gautam ;
Yasin, Rizwan ;
Djenouri, Youcef .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (01) :19-28
[6]   CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction [J].
Alkan, Oznur Kirmemis ;
Karagoz, Pinar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) :2645-2657
[7]   Big Sensor Data Systems for Smart Cities [J].
Ang, Li-Minn ;
Seng, Kah Phooi ;
Zungeru, Adamu Murtala ;
Ijemaru, Gerald K. .
IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (05) :1259-1271
[8]  
[Anonymous], 2008, SPARK PLATFORM
[9]  
[Anonymous], 2008, HADOOP PLATFORM
[10]   IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges [J].
Cai, Hongming ;
Xu, Boyi ;
Jiang, Lihong ;
Vasilakos, Athanasios V. .
IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (01) :75-87