Association rule mining algorithm based on Spark for pesticide transaction data analyses

被引:6
|
作者
Bai, Xiaoning [1 ,2 ]
Jia, Jingdun [1 ,3 ]
Wei, Qiwen [4 ]
Huang, Shuaiqi [1 ]
Du, Weicheng [5 ]
Gao, Wanlin [1 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr & Rural Affairs, Inst Control Agrochem, Beijing 100125, Peoples R China
[3] Minist Sci & Technol, Torch Ctr, Beijing 100045, Peoples R China
[4] Natl Agr Technol Promot Ctr, Beijing 100125, Peoples R China
[5] Minist Agr & Rural Affairs, Informat Ctr, Beijing 100125, Peoples R China
基金
中国国家自然科学基金;
关键词
Spark; association rule mining; ICAMA algorithm; big data; pesticide regulation; MapReduce;
D O I
10.25165/j.ijabe.20191205.4881
中图分类号
S2 [农业工程];
学科分类号
0828 ;
摘要
With the development of smart agriculture, the accumulation of data in the field of pesticide regulation has a certain scale. The pesticide transaction data collected by the Pesticide National Data Center alone produces more than 10 million records daily. However, due to the backward technical means, the existing pesticide supervision data lack deep mining and usage. The Apriori algorithm is one of the classic algorithms in association rule mining, but it needs to traverse the transaction database multiple times, which will cause an extra IO burden. Spark is an emerging big data parallel computing framework with advantages such as memory computing and flexible distributed data sets. Compared with the Hadoop MapReduce computing framework, IO performance was greatly improved. Therefore, this paper proposed an improved Apriori algorithm based on Spark framework, ICAMA. The MapReduce process was used to support the candidate set and then to generate the candidate set. After experimental comparison, when the data volume exceeds 250 Mb, the performance of Spark-based Apriori algorithm was 20% higher than that of the traditional Hadoop-based Apriori algorithm, and with the increase of data volume, the performance improvement was more obvious.
引用
收藏
页码:162 / 166
页数:5
相关论文
共 50 条
  • [41] Data squashing as preprocessing in association rule mining
    Fister, Iztok
    Fister, Iztok, Jr.
    Novak, Damijan
    Verber, Domen
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1720 - 1725
  • [42] Reservoir Attribute Association Analysis Algorithm for Enhanced Oil Recovery Based on Association Rule Mining
    Teng, Lihui
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 588 - 595
  • [43] Spark-based Spatial Association Mining
    Binzani, Kanika
    Yoo, Jin Soung
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5300 - 5301
  • [44] ODAM: An optimized distributed association rule mining algorithm
    Ashrafi, Mafruz Zaman
    Taniar, David
    Smith, Kate
    IEEE Distributed Systems Online, 2004, 5 (03): : 1 - 18
  • [45] A Novel Algorithm for Association Rule Mining Without Candidate
    Zhou, Huanyin
    Liu, Jinsheng
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 116 - +
  • [46] An Hybrid Multi-Core/GPU-Based Mimetic Algorithm for Big Association Rule Mining
    Djenouri, Youcef
    Belhadi, Asma
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    GENETIC AND EVOLUTIONARY COMPUTING, 2018, 579 : 57 - 63
  • [47] Wolf Search Algorithm for Numeric Association Rule Mining
    Agbehadji, Israel Edem
    Fong, Simon
    Millham, Richard
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 146 - 151
  • [48] Research on parallelization of Apriori algorithm in association rule mining
    Wang, Huan-Bin
    Gao, Yang-Jun
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 641 - 647
  • [49] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Feng Zhang
    Min Liu
    Feng Gui
    Weiming Shen
    Abdallah Shami
    Yunlong Ma
    Cluster Computing, 2015, 18 : 1493 - 1501
  • [50] A Comparison Between Rule Based and Association Rule Mining Algorithms
    Mazid, Mohammed M.
    Ali, A. B. M. Shawkat
    Tickle, Kevin S.
    NSS: 2009 3RD INTERNATIONAL CONFERENCE ON NETWORK AND SYSTEM SECURITY, 2009, : 452 - 455