Frequent Itemset Mining in Big Data With Effective Single Scan Algorithms

被引：31

作者：

Djenouri, Youcef ^{[1
,2
]}

Djenouri, Djamel ^{[3
]}

Lin, Jerry Chun-Wei ^{[4
,5
]}

Belhadi, Asma ^{[6
]}

机构：

[1] Southern Denmark Univ, Sch Math & Comp Sci, DK-5230 Odense, Denmark

[2] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7491 Trondheim, Norway

[3] CERIST Res Ctr, Algiers 16028, Algeria

[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 150006, Peoples R China

[5] Western Norway Univ Appl Sci, Dept Comp Math & Phys, N-5020 Bergen, Norway

[6] Univ Sci & Technol Houari Boumediene, RIMA, Algiers 16111, Algeria

来源：

IEEE ACCESS | 2018年 / 6卷

基金：

中国国家自然科学基金;

关键词：

Apriori; frequent itemset mining; heuristic; parallel computing; support computing; EFFICIENT ALGORITHM; MAPREDUCE; PATTERNS;

D O I：

10.1109/ACCESS.2018.2880275

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper considers frequent itemsets mining in transactional databases. It introduces a new accurate single scan approach for frequent itemset mining (SSFIM), a heuristic as an alternative approach (EA-SSFIM), as well as a parallel implementation on Hadoop clusters (MR-SSFIM). EA-SSFIM and MR-SSFIM target sparse and big databases, respectively. The proposed approach (in all its variants) requires only one scan to extract the candidate itemsets, and it has the advantage to generate a fixed number of candidate itemsets independently from the value of the minimum support. This accelerates the scan process compared with existing approaches while dealing with sparse and big databases. Numerical results show that SSFIM outperforms the state-of-the-art FIM approaches while dealing with medium and large databases. Moreover, EA-SSFIM provides similar performance as SSFIM while considerably reducing the runtime for large databases. The results also reveal the superiority of MR-SSFIM compared with the existing HPC-based solutions for FIM using sparse and big databases.

引用

页码：68013 / 68026

页数：14

共 39 条

[1]

Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072

[2]

[Anonymous], 2005, P 1 INT WORKSH OP SO, DOI DOI 10.1145/1133905.1133907

[3] Frequent Itemsets Mining for Big Data: A Comparative Analysis [J].

Apiletti, Daniele ;

Baralis, Elena ;

Cerquitelli, Tania ;

Garza, Paolo ;

Pulvirenti, Fabio ;

Venturini, Luca .

BIG DATA RESEARCH, 2017, 9 :67-83

[4] An Iterative MapReduce Based Frequent Subgraph Mining Algorithm [J].

Bhuiyan, Mansurul A. ;

Al Hasan, Mohammad .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) :608-620

[5]

Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]

[6] Closed Patterns Meet n-ary Relations [J].

Cerf, Loic ;

Besson, Jeremy ;

Robardet, Celine ;

Boulicaut, Jean-Francois .

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)

[7] Mapreduce: Simplified data processing on large clusters [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113

[8] Intelligent mapping between GPU and cluster computing for discovering big association rules [J].

Djenouri, Youcef ;

Djenouri, Djamel ;

Habbas, Zineb .

APPLIED SOFT COMPUTING, 2018, 65 :387-399

[9] Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem [J].

Djenouri, Youcef ;

Comuzzi, Marco .

INFORMATION SCIENCES, 2017, 420 :1-15

[10] Data Mining-Based Decomposition for Solving the MAXSAT Problem: Toward a New Approach [J].

Djenouri, Youcef ;

Habbas, Zineb ;

Djenouri, Djamel .

IEEE INTELLIGENT SYSTEMS, 2017, 32 (04) :48-58

← 1 2 3 4 →