Cosine interesting pattern discovery

被引:20
作者
Wu, Junjie [1 ]
Zhu, Shiwei [2 ]
Liu, Hongfu [1 ]
Xia, Guoping [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Cent Univ Finance & Econ, Sch Informat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Correlation computation; Interestingness measure; Cosine similarity; Conditional anti-monotone property; Set enumeration tree; HYPERCLIQUE PATTERN;
D O I
10.1016/j.ins.2011.09.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years have witnessed an increasing interest in computing cosine similarity between high-dimensional documents, transactions, and gene sequences, etc. Most previous studies limited their scope to the pairs of items, which cannot be adapted to the multi-itemset cases. Therefore, from a frequent pattern mining perspective, there exists still a critical need for discovering interesting patterns whose cosine similarity values are above some given thresholds. However, the knottiest point of this problem is, the cosine similarity has no anti-monotone property. To meet this challenge, we propose the notions of conditional anti-monotone property and Support-Ascending Set Enumeration Tree (SA-SET). We prove that the cosine similarity has the conditional anti-monotone property and therefore can be used for the interesting pattern mining if the itemset traversal sequence is defined by the SA-SET. We also identify the anti-monotone property of an upper bound of the cosine similarity, which can be used in further pruning the candidate itemsets. An Apriori-like algorithm called CosMiner is then put forward to mine the cosine interesting patterns from large-scale multi-item databases. Experimental results show that CosMiner can efficiently identify interesting patterns using the conditional anti-monotone property of the cosine similarity and the anti-monotone property of its upper bound, even at extremely low levels of support. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:176 / 195
页数:20
相关论文
共 37 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Alexander C., 2001, MARKET MODELS GUIDE
[3]  
[Anonymous], 2005, Data Mining: Concepts and Techniques
[4]  
[Anonymous], 2002, APPL MULTIPLE REGRES, DOI DOI 10.4324/9780203774441
[5]  
[Anonymous], 2003, P FIMI 03 P IEEE ICD
[6]   An algorithm to mine general association rules from tabular data [J].
Ayubi, Siyamand ;
Muyeba, Maybin K. ;
Baraani, Ahmad ;
Keane, John .
INFORMATION SCIENCES, 2009, 179 (20) :3520-3539
[7]  
Bayardo R.J., 2007, WWW INT C WORLD WID, P131, DOI [DOI 10.1145/1242572.1242591, 10.1145/1242572.1242591]
[8]  
Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[9]   Using information-theoretic measures to assess association rule interestingness [J].
Blanchard, J ;
Guillet, F ;
Gras, R ;
Briand, H .
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, :66-73
[10]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265