A novel multi-core algorithm for frequent itemsets mining in data streams

被引：3

作者：

Bustio-Martinez, Lazaro ^{[1
]}

Munoz-Briseno, Alfredo ^{[2
]}

Cumplido, Rene ^{[1
]}

Hernandez-Leon, Raudel ^{[2
]}

Feregrino-Uribe, Claudia A. ^{[1
]}

机构：

[1] Natl Inst Astrophys Opt & Elect, Luis Enrique Erro 1, Puebla 72840, Mexico

[2] Adv Technol Applicat Ctr, 7a 21406 E 214&216, Havana 12200, Cuba

来源：

PATTERN RECOGNITION LETTERS | 2019年 / 125卷

关键词：

Frequent itemsets mining; Data streams; Lexicographic order; Gearman; Parallel algorithms; SPARK;

D O I：

10.1016/j.patrec.2019.05.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data streams are modern data sources that are gaining attention as a consequence of their many practical applications (they can be found in data transmission, eCommerce, and intrusion detection system among others). Nevertheless, the efforts to obtain insights from data streams are limited due to their massive information volume and the time needed to process them. In this paper, a new approach for Frequent Itemsets Mining on data streams based on prefix trees which takes advantage of multi-core systems is proposed. This approach uses the Gearman framework as the interface for multi-core processing, and it allows to exploit their scalability efficiently. Experimental results show that the proposed method obtains the same patterns compared with similar approaches reported in the state-of-the-art and outperforms them concerning the processing time required. Also, it is proved that the proposed method is insensitive to variations in the support threshold value, and its efficiency depends on the size of the transactions and not on the size of the alphabet, which is a significant issue in other Frequent Itemsets Mining algorithms. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：241 / 248

页数：8

共 21 条

[1]

Agrawal R., 1994, P 20 INT C VER LARG, P487

[2]

[Anonymous], 2015, Hadoop-The Definitive Guide: Storage and Analysis at Internet Scale

[3]

[Anonymous], INT WORKSH NEW FRONT

[4]

Babcock Brian, 2002, PODS, P1, DOI DOI 10.1145/543613.543615

[5]

Bustio-Martinez L, 2017, 2017 IEEE 8 LAT AM S, P1, DOI [10.1109/LASCAS.2017.7948076, DOI 10.1109/LASCAS.2017.7948076]

[6] Review and comparison of Apriori algorithm implementations on Hadoop-MapReduce and Spark [J].

Castro, Eduardo P. S. ;

Maia, Thiago D. ;

Pereira, Marluce R. ;

Esmin, Ahmed A. A. ;

Pereira, Denilson A. .

KNOWLEDGE ENGINEERING REVIEW, 2018, 33 :1-25

[7]

Chang J.H., 2003, P 9 ACM SIGKDD INT C, P487, DOI DOI 10.1145/956750.956807

[8] estWin:: Online data stream mining of recent frequent itemsets by sliding window method [J].

Chang, JH ;

Lee, WS .

JOURNAL OF INFORMATION SCIENCE, 2005, 31 (02) :76-90

[9] Beyond the hype: Big data concepts, methods, and analytics [J].

Gandomi, Amir ;

Haider, Murtaza .

INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2015, 35 (02) :137-144

[10]

Han JW, 2000, SIGMOD RECORD, V29, P1

← 1 2 3 →