Scalable algorithm for generation of attribute implication base using FP-growth and spark

被引:12
作者
Chunduri, Raghavendra Kumar [1 ]
Cherukuri, Aswani Kumar [1 ]
机构
[1] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore 632014, Tamil Nadu, India
关键词
Apache Spark; Data frames; Formal concept analysis; FP-growth; Implication base; Machine learning; Resilient distributed dataset; FORMAL CONCEPT ANALYSIS; ASSOCIATION RULES;
D O I
10.1007/s00500-021-05844-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formal concept analysis (FCA) is an unsupervised machine learning technique used for knowledge discovery and representation. A major task in FCA is the enumeration of the implications to construct the implication base. Even though there are many efficient classical and parallel algorithms proposed for constructing the implication base, the existing algorithms are not well suited for large formal contexts because of their architectural complexity. All the existing works use either stem-base or proper-premise approach to find the implication base in exponential time. Hence, we introduce a distributed algorithm to find the implication base quickly in larger datasets in polynomial time. In this paper, we propose a scalable algorithm to find the implication base using machine learning technique FP-growth, big data processing framework Apache Spark and executed on large formal contexts. Extensive experiments on the real-world datasets show that the proposed algorithm has an improved gain in performance metrics such as execution time, CPU and memory usage. The statistical validations on the experimental results prove that the proposed algorithm has the better potential to find the implication base.
引用
收藏
页码:9219 / 9240
页数:22
相关论文
共 52 条
[1]   An effective distributed predictive model with Matrix factorization and random forest for Big Data recommendation systems [J].
Ait Hammou, Badr ;
Ait Lahcen, Ayoub ;
Mouline, Salma .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 137 :253-265
[2]  
[Anonymous], 2008, INTRO FORMAL CONCEPT
[3]   A model of three-way decisions for Knowledge Harnessing [J].
Aranda-Corral, Gonzalo A. ;
Borrego-Diaz, Joaquin ;
Galan-Paez, Juan .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2020, 120 :184-202
[4]  
Aswani Kumar C, 2021, IET HDB BIG DATA ANA
[5]   Leveraging resource management for efficient performance of Apache Spark [J].
Aziz, Khadija ;
Zaidouni, Dounia ;
Bellafkih, Mostafa .
JOURNAL OF BIG DATA, 2019, 6 (01)
[6]   Optimizations in computing the Duquenne-Guigues basis of implications [J].
Bazhanov, Konstantin ;
Obiedkov, Sergei .
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2014, 70 (1-2) :5-24
[7]   Frequent item set mining [J].
Borgelt, Christian .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :437-456
[8]   SCARFF: A scalable framework for streaming credit card fraud detection with spark [J].
Carcillo, Fabrizio ;
Dal Pozzolo, Andrea ;
Le Borgne, Yann-Ael ;
Caelen, Olivier ;
Mazzer, Yannis ;
Bontempi, Gianluca .
INFORMATION FUSION, 2018, 41 :182-194
[9]   Formal concept analysis of multi-scale formal context [J].
Chen, Dongxiao ;
Li, Jinjin ;
Lin, Rongde .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (11) :5315-5327
[10]  
Christian B., 2005, P 1 INT WORKSH OP SO, P1