An algebraic semigroup method for discovering maximal frequent itemsets

被引:0
作者
Liu, Jiang [1 ]
Li, Jing [1 ]
Ni, Feng [1 ]
Xia, Xiang [1 ]
Li, Shunlong [1 ]
Dong, Wenhui [1 ]
机构
[1] Univ Shanghai Sci & Technol, Dept Syst Sci, Shanghai 200093, Peoples R China
来源
OPEN MATHEMATICS | 2022年 / 20卷 / 01期
基金
中国国家自然科学基金;
关键词
maximal frequent itemset; association rule mining; generator; semigroup; PATTERNS;
D O I
10.1515/math-2022-0516
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Discovering maximal frequent itemsets is an important issue and key technique in many data mining problems such as association rule mining. In the literature, generating maximal frequent itemsets proves either to be NP-hard or to have O(l(3)4(l)(m + n)) complexity in the worst case from the perspective of generating maximal complete bipartite graphs of a bipartite graph, where m, n are the item number and the transaction number, respectively, and l denotes the maximum of vertical bar C parallel to Psi(C)vertical bar/vertical bar C vertical bar + vertical bar Psi(C)vertical bar - 1), with the maximum taken over all maximal frequent itemsets C. In this article, we put forward a method for discovering maximal frequent itemsets, whose complexity is O(3mn2(beta) + 4(beta)n), lower than the known complexity both in the worst case, from the perspective of semigroup algebra, where beta is the number of items whose support is more than the minimum support threshold. Experiments also show that an algorithm based on the algebraic method performs better than the other three well-known algorithms. Meanwhile, we explore some algebraic properties with respect to items and transactions, prove that the maximal frequent itemsets are exactly the simplified generators of frequent itemsets, give a necessary and sufficient condition for a maximal i + 1-frequent itemset being a subset of a closed i-frequent itemset, and provide a recurrence formula of maximal frequent itemsets.
引用
收藏
页码:1432 / 1443
页数:12
相关论文
共 22 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Agrawal R., 1994, PROC 20 INT C VERY L, V1215, P487, DOI DOI 10.5555/645920.672836
  • [3] Bayardo R. J. Jr., 1998, SIGMOD Record, V27, P85, DOI 10.1145/276305.276313
  • [4] On maximal frequent and minimal infrequent sets in binary matrices
    Boros, E
    Gurvich, V
    Khachiyan, L
    Makino, K
    [J]. ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2003, 39 (03) : 211 - 221
  • [5] Clifford A.H., 1961, ALGEBRAIC THEORY SEM
  • [6] Dhabu M. M., 2021, INFORM SYSTEMS TECHN, V285, DOI [10.1007/978-3-642-29166-1_3, DOI 10.1007/978-3-642-29166-1_3]
  • [7] ARBORICITY AND BIPARTITE SUBGRAPH LISTING ALGORITHMS
    EPPSTEIN, D
    [J]. INFORMATION PROCESSING LETTERS, 1994, 51 (04) : 207 - 211
  • [8] CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets
    Fatemi, Seyed Mohsen
    Hosseini, Seyed Mohsen
    Kamandi, Ali
    Shabankhah, Mahmood
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (02) : 365 - 383
  • [9] Fayyad U, 1996, AI MAG, V17, P37
  • [10] Gunopulos D., 1997, DATABASE THEORY ICDT, V1186