Map-optimize-reduce: CAN tree assisted FP-growth algorithm for clusters based FP mining on Hadoop

被引：21

作者：

Ragaventhiran, J. ^{[1
]}

Kavithadevi, M. K. ^{[2
]}

机构：

[1] Syed Ammal Engn Coll, Dept CSE, Ramanathapuram, India

[2] Thiagarajar Coll Engn, Dept CSE, Madurai, Tamil Nadu, India

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 103卷

关键词：

Frequent pattern mining; Map-optimize-reduce; Clustering; Load balancing; CAN tree based FP growth; User query; FREQUENT PATTERNS; SEQUENTIAL PATTERNS; SKEWED DATA; MAPREDUCE;

D O I：

10.1016/j.future.2019.09.041

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Over the past era, Frequent Pattern Mining (FPM) is emerging as a significant approach to discover fascinating knowledge concealed in the data. However, preceding works failed to address the validation of FPM with user queries and also achieving better scalability and execution time is still bottleneck owing to difficulties in handling large dataset. To address this downside, our proposed work establishes FPM using extend version of MapReduce framework in Hadoop environment. Our proposed work comprises of five processes that are: 1) Preprocessing 2) Affinity Propagation (AP) based Clustering 3) Load Balancing 4) Map-Optimize-Reduce 5) Mining User Queries. Primarily, our proposed work performs preprocessing to remove data redundancy. To speed up the MapReduce framework, we propose AP clustering which generates effective clusters from the given dataset. Load balancing is executed to balance load among different blocks concerning where reputation is computed. To avoid oversight in scanning and minimal searching space in MapReduce, optimizer is included between Mapper and Reducer where Emperor Penguin Colony (EPC) optimization is used. Frequent patterns are mined using CANonical order (CAN) tree based Frequent Pattern (FP) growth which reduces execution time and frequent tree construction. User provides Mining_Request to the Hadoop and frequent patterns are mined for given query which is send back to the user. If user given query is not present in the CAN tree, then it sends Relevance Feedback as a recommendation to the user. Finally, we validate our proposed work performance with the previous works for succeeding metrics that are Execution Time, Response Time, Load Balancing Rate, and Scalability. (C) 2019 Published by Elsevier B.V.

引用

页码：111 / 122

页数：12

共 34 条

[1] Kavosh: an effective Map-Reduce-based association rule mining method [J].

Barkhordari, Mohammadhossein ;

Niamanesh, Mahdi .

JOURNAL OF BIG DATA, 2018, 5 (01)

[2]

Chong KH, 2018, ROUT ADV REGION ECON, P1

[3] Frequent Itemset Mining in Big Data With Effective Single Scan Algorithms [J].

Djenouri, Youcef ;

Djenouri, Djamel ;

Lin, Jerry Chun-Wei ;

Belhadi, Asma .

IEEE ACCESS, 2018, 6 :68013-68026

[4] MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets [J].

Duong, Khanh-Chuong ;

Bamha, Mostafa ;

Giacometti, Arnaud ;

Li, Dominique ;

Soulet, Arnaud ;

Vrain, Christel .

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 :478-495

[5] Load balancing in join algorithms for skewed data in MapReduce systems [J].

Gavagsaz, Elaheh ;

Rezaee, Ali ;

Javadi, Hamid Haj Seyyed .

JOURNAL OF SUPERCOMPUTING, 2019, 75 (01) :228-254

[6] Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling [J].

Gavagsaz, Elaheh ;

Rezaee, Ali ;

Javadi, Hamid Haj Seyyed .

JOURNAL OF SUPERCOMPUTING, 2018, 74 (07) :3415-3440

[7] A New Methodology for Mining Frequent Itemsets on Temporal Data [J].

Ghorbani, Mazaher ;

Abessi, Masoud .

IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2017, 64 (04) :566-573

[8]

Huang C.-J., 2018, CLOTHING LANDMARK DE, P1

[9] Mining of productive periodic-frequent patterns for IoT data analytics [J].

Ismail, Walaa N. ;

Hassan, Mohammad Mehedi ;

Alsalamah, Hessah A. .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 :512-523

[10] Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach [J].

Karim, Md. Rezaul ;

Cochez, Michael ;

Beyan, Oya Deniz ;

Ahmed, Chowdhury Farhan ;

Decker, Stefan .

INFORMATION SCIENCES, 2018, 432 :278-300

← 1 2 3 4 →