Distributed synthesized association mining for big transactional data

被引:4
|
作者
Pal, Amrit [1 ,2 ]
Kumar, Manish [2 ]
机构
[1] GLA Univ, Dept Comp Engn & Applicat, Mathura, India
[2] Indian Inst Informat Technol Allahabad, Dept Informat Technol, Prayagraj, India
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2020年 / 45卷 / 01期
关键词
Big Data; HDFS; MapReduce; Apriori; frequent itemset; association rule; DATA SETS; RULES; PATTERNS;
D O I
10.1007/s12046-020-01380-8
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data is increasing rapidly day by day along with the transactional database. Dividing this data and storing it in a distributed manner is an effective way for storage and retrieval. Mining such distributed data with minimum dependence between sub-problems is a crucial task. Finding frequent itemsets and corresponding association rules is a big challenge while considering the aggregation in a distributed environment. To overcome these challenges, we propose a distributed frequent itemset generation and association rule mining algorithm using MapReduce programming model. The proposed scheme generates frequent itemset and mine association rules using a synthesized distributed technique. The rules are mined in a distributed manner, and then weights are assigned to subsets of data and association rules. A proper mixture of association rules that are generated in distributed manner is done using a weighted approach. This paper presents a novel MapReduce-based synthesis approach, which can work well over a distributed storage of large amount of data.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Mining 'Following' Patterns from Big but Sparsely Distributed Social Network Data
    Leung, Carson K.
    Middleton, Ryan
    Pazdor, Adam G. M.
    Won, Yeyoung
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 916 - 919
  • [42] Parallelization with Multiplicative Algorithms for Big Data Mining
    Luo, Dijun
    Ding, Chris
    Huang, Heng
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 489 - 498
  • [43] A distributed platform for intrusion detection system using data stream mining in a big data environment
    Schuartz, Fabio Cesar
    Fonseca, Mauro
    Munaretto, Anelise
    ANNALS OF TELECOMMUNICATIONS, 2024, 79 (7-8) : 507 - 521
  • [44] Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics
    Ramakrishnan, Raghu
    Sridharan, Baskar
    Douceur, John R.
    Kasturi, Pavan
    Krishnamachari-Sampath, Balaji
    Krishnamoorthy, Karthick
    Li, Peng
    Manu, Mitica
    Michaylov, Spiro
    Ramos, Rogerio
    Sharman, Neil
    Xu, Zee
    Barakat, Youssef
    Douglas, Chris
    Draves, Richard
    Naidu, Shrikant S.
    Shastry, Shankar
    Sikaria, Atul
    Sun, Simon
    Venkatesan, Ramarathnam
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 51 - 63
  • [45] A Distributed Fuzzy Associative Classifier for Big Data
    Segatori, Armando
    Bechini, Alessio
    Ducange, Pietro
    Marcelloni, Francesco
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (09) : 2656 - 2669
  • [46] An Efficient Distributed Algorithm for Big Data Processing
    Al-kahtani, Mohammed S.
    Karim, Lutful
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2017, 42 (08) : 3149 - 3157
  • [47] On Distributed Fuzzy Decision Trees for Big Data
    Segatori, Armando
    Marcelloni, Francesco
    Pedrycz, Witold
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (01) : 174 - 192
  • [48] An Efficient Distributed Algorithm for Big Data Processing
    Mohammed S. Al-kahtani
    Lutful Karim
    Arabian Journal for Science and Engineering, 2017, 42 : 3149 - 3157
  • [49] Mining big data in tourism
    Iorio C.
    Pandolfo G.
    D’Ambrosio A.
    Siciliano R.
    Quality & Quantity, 2020, 54 (5-6) : 1655 - 1669
  • [50] DIMSpan - Transactional Frequent Subgraph Mining with Distributed In -Memory Dataflow Systems
    Petermann, Andre
    Junghanns, Martin
    Rahm, Erhard
    BDCAT'17: PROCEEDINGS OF THE FOURTH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2017, : 237 - 246