Fast Top-K association rule mining using rule generation property pruning

被引：25

作者：

Liu, Xiangyu ^{[1
]}

Niu, Xinzheng ^{[1
]}

Fournier-Viger, Philippe ^{[2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

[2] Shenzhen Grad Sch, Harbin Inst Technol, Sch Humanities & Social Sci, Shenzhen, Peoples R China

来源：

APPLIED INTELLIGENCE | 2021年 / 51卷 / 04期

关键词：

Data mining; Association rule mining; Top-k rules; Rule expansion; HIGH-UTILITY ITEMSETS; FREQUENT PATTERNS; EFFICIENT ALGORITHM; KNOWLEDGE; LOGS;

D O I：

10.1007/s10489-020-01994-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional association rule mining algorithms can have a long runtime, high memory consumption, and generate a huge number of rules. Browsing through numerous rules and adjusting parameters to find just enough rules is a tedious task for users, who are often only interested in finding the strongest rules. Hence, many recent studies have focused on mining the top-k most frequent association rules that have a minimum confidence so as to limit the number of rules by ranking them by frequency. Though this redefined task has many applications, the performance of current algorithms remains an issue. To address this issue, this paper presents a novel algorithm named FTARM (Fast Top-K Association Rule Miner) to efficiently find the set of top-k association rules using a novel technique called Rule Generation Property Pruning (RGPP). This technique reduces the search space by analyzing the internal relationships between items of the database to be mined and the parameters set by users. Furthermore, a novel candidate pruning property is used by this technique to speed up the mining process. FTARM's efficiency was evaluated on various public benchmark datasets. A substantial reduction of the association rule mining time and memory usage was observed, and that FTARM has good scalability, which can benefit to many applications.

引用

页码：2077 / 2093

页数：17

共 50 条

[1]

Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072

[2] WCBA: Weighted classification based on association rules algorithm for breast cancer disease [J].

Alwidian, Jaber ;

Hammo, Bassam H. ;

Obeid, Nadim .

APPLIED SOFT COMPUTING, 2018, 62 :536-549

[3] Association rule mining using treap [J].

Anand, H. S. ;

Vinodchandra, S. S. .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (04) :589-597

[4]

ANWAR T, 2019, J KING SAUD U

[5] Incremental Algorithm for Association Rule Mining under Dynamic Threshold [J].

Aqra, Iyad ;

Ghani, Norjihan Abdul ;

Maple, Carsten ;

Machado, Jose ;

Safa, Nader Sohrabi .

APPLIED SCIENCES-BASEL, 2019, 9 (24)

[6] negFIN: An efficient algorithm for fast mining frequent itemsets [J].

Aryabarzan, Nader ;

Minaei-Bidgoli, Behrouz ;

Teshnehlab, Mohammad .

EXPERT SYSTEMS WITH APPLICATIONS, 2018, 105 :129-143

[7] Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy [J].

Bay Vo ;

Huong Bui ;

Thanh Vo ;

Tuong Le .

KNOWLEDGE-BASED SYSTEMS, 2020, 201

[8] Using hashing and lexicographic order for Frequent Itemsets Mining on data streams [J].

Bustio-Martinez, Lazaro ;

Letras-Luna, Martin ;

Cumplido, Rene ;

Hernandez-Leon, Raudel ;

Feregrino-Uribe, Claudia ;

Bande-Serrano, Jose M. .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 125 :58-71

[9] GMiner: A fast GPU-based frequent itemset mining method for large-scale data [J].

Chon, Kang-Wook ;

Hwang, Sang-Hyun ;

Kim, Min-Soo .

INFORMATION SCIENCES, 2018, 439 :19-38

[10] Mining top-k frequent patterns in the presence of the memory constraint [J].

Chuang, Kun-Ta ;

Huang, Jiun-Long ;

Chen, Ming-Syan .

VLDB JOURNAL, 2008, 17 (05) :1321-1344

← 1 2 3 4 5 →