Online Association Rule Mining over Fast Data

被引:17
作者
Olmezogullari, Erdi [1 ]
Ari, Ismail [1 ]
机构
[1] Ozyegin Univ, Dept Comp Engn, Istanbul, Turkey
来源
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA | 2013年
关键词
Fast data; big data; association rule mining; complex event processing; FP-Growth;
D O I
10.1109/BigData.Congress.2013.77
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To extract useful and actionable information in real-time, the information technology (IT) world is coping with big data problems today. In this paper, we present implementation details and performance results of ReCEPtor, our system for "online" Association Rule Mining (ARM) over big and fast data streams. Specifically, we added Apriori and two different FP-Growth algorithms inside Esper Complex Event Processing (CEP) engine and compared their performances using LastFM social music site data. Our most important findings show that online ARM can generate (1) more unique rules, (2) with higher throughput, and (3) much sooner (lower latency) than offline rule mining. In addition, we have found many interesting and realistic musical preference rules such as "George Harrison -> Beatles". We demonstrate a sustained rate of similar to 15K rows/sec per core. We hope that our findings can shed light on the design and
引用
收藏
页码:110 / 117
页数:8
相关论文
共 15 条
[1]   Aurora: a new model and architecture for data stream management [J].
Abadi, DJ ;
Carney, D ;
Cetintemel, U ;
Cherniack, M ;
Convey, C ;
Lee, S ;
Stonebraker, M ;
Tatbul, N ;
Zdonik, S .
VLDB JOURNAL, 2003, 12 (02) :120-139
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[3]  
Ari I., 2011, FED WIR SENS SYST WO
[4]  
Ari I, 2012, IEEE INT WORKS MACH
[5]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[6]  
Borgelt C, 2005, P 1 INT WORKSH OP SO, P1, DOI DOI 10.1145/1133905.1133907
[7]  
EsperTech Inc, EV STREAM INT
[8]  
Gaber MM, 2005, SIGMOD REC, V34, P18, DOI 10.1145/1083784.1083789
[9]  
Giannella Chris., 2003, Mining Frequent Patterns in Data Streams at Multiple Time Granularities
[10]  
Hansen D., 2013, BIG DATA GETS REAL T