Finding interesting associations without support pruning

被引:162
作者
Cohen, E
Datar, M
Fujiwara, S
Gionis, A
Indyk, P
Motwani, R
Ullman, JD
Yang, C
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
[2] Hitachi Ltd, Cupertino, CA 95014 USA
[3] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
data mining; association rules; similarity metric; min hashing; locality sensitive hashing;
D O I
10.1109/69.908981
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.
引用
收藏
页码:64 / 78
页数:15
相关论文
共 16 条
  • [1] Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
  • [2] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [3] Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
  • [4] On the resemblance and containment of documents
    Broder, AZ
    [J]. COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 21 - 29
  • [5] Size-estimation framework with applications to transitive closure and reachability
    Cohen, E
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (03) : 441 - 453
  • [6] Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
  • [7] GOLDBERG D, 1991, COMMUN ACM, V55, P1
  • [8] Guha S., 1998, SIGMOD Record, V27, P73, DOI 10.1145/276305.276312
  • [9] Hart P.E., 1973, Pattern recognition and scene analysis
  • [10] Hellerstein Joseph M., 1997, P ACM SIGMOD INT C M