On the Efficient Representation of Datasets as Graphs to Mine Maximal Frequent Itemsets

被引:33
作者
Halim, Zahid [1 ]
Ali, Omer [1 ]
Khan, Muhammad Ghufran [1 ]
机构
[1] Ghulam Ishaq Khan Inst Engn Sci & Technol, Machine Intelligence Res Grp MInG, Fac Comp Sci & Engn, Topi 23460, Pakistan
关键词
Itemsets; Data mining; Databases; Data structures; Task analysis; Benchmark testing; Machine intelligence; Efficient frequent itemsets extraction; efficient data structure; graph utility; maximal frequent itemsets; ASSOCIATION RULES; ALGORITHM; PATTERNS;
D O I
10.1109/TKDE.2019.2945573
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemsets mining is an active research problem in the domain of data mining and knowledge discovery. With the advances in database technology and an exponential increase in data to be stored, there is a need for efficient approaches that can quickly extract useful information from such large datasets. Frequent Itemsets (FIs) mining is a data mining task to find itemsets in a transactional database which occur together above a certain frequency. Finding these FIs usually requires multiple passes over the databases; therefore, making efficient algorithms crucial for mining FIs. This work presents a graph-based approach for representing a complete transactional database. The proposed graph-based representation enables the storing of all relevant information (for extracting FIs) of the database in one pass. Later, an algorithm that extracts the FIs from the graph-based structure is presented. Experimental results are reported comparing the proposed approach with 17 related FIs mining methods using six benchmark datasets. Results show that the proposed approach performs better than others in terms of time.
引用
收藏
页码:1674 / 1691
页数:18
相关论文
共 69 条
[11]   Multilingual Spoken Language Understanding using graphs and multiple translations [J].
Calvo, Marcos ;
Hurtado, Lluis-Felip ;
Garcia, Fernando ;
Sanchis, Emilio ;
Segarra, Encarna .
COMPUTER SPEECH AND LANGUAGE, 2016, 38 :86-103
[12]  
Cameron J.J., 2014, P WORKSH EDBT ICDT 2, P240
[13]   A two-way hybrid algorithm for maximal frequent itemsets mining [J].
Chen, Fu-zan ;
Li, Min-qiang .
FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, :499-503
[14]   Data stream mining architecture for network intrusion detection [J].
Chu, NCN ;
Williams, A ;
Alhajj, R ;
Barker, K .
PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, :363-368
[15]  
Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47
[16]   DiffNodesets: An efficient structure for fast mining frequent itemsets [J].
Deng, Zhi-Hong .
APPLIED SOFT COMPUTING, 2016, 41 :214-223
[17]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432
[18]   Fast mining frequent itemsets using Nodesets [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (10) :4505-4512
[19]   Mining border descriptions of emerging patterns from dataset pairs [J].
Dong, GZ ;
Li, JY .
KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (02) :178-202
[20]  
Fournier-Viger Philippe, 2013, Advanced Data Mining and Applications. 9th International Conference, ADMA 2013. Proceedings: LNCS 8346, P169, DOI 10.1007/978-3-642-53914-5_15