SAMOA: A Platform for Mining Big Data Streams

被引:0
作者
De Francisci Morales, Gianmarco [1 ]
机构
[1] Yahoo Res, Barcelona, Spain
来源
PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION) | 2013年
关键词
Big Data; Data Streams; Stream Mining; Distributed Computing; Machine Learning; Open Source;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media and user generated content are causing an ever growing data deluge. The rate at which we produce data is growing steadily, thus creating larger and larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to ongoing events. However, current (de-facto standard) solutions for big data analysis are not designed to deal with evolving streams. In this talk, we offer a sneak preview of SAMOA, an upcoming platform for mining dig data streams. SAMOA is a platform for online mining in a cluster/cloud environment. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as S4 and Storm. SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. Finally, SAMOA will soon be open sourced in order to foster collaboration and research on big data stream mining.
引用
收藏
页码:777 / 778
页数:2
相关论文
共 10 条
[1]  
Abadi DJ., 2005, CIDR, V5, P277
[2]  
[Anonymous], 2010, P 7 USENIX C NETW SY
[3]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[4]  
Bu YY, 2010, PROC VLDB ENDOW, V3, P285
[5]  
Dean J., 2004, OSDI 04 6 S OPEARTIN, p137{150
[6]   On Distributing Symmetric Streaming Computations [J].
Feldman, Jon ;
Muthukrishnan, S. ;
Sidiropoulos, Anastasios ;
Stein, Cliff ;
Svitkina, Zoya .
ACM TRANSACTIONS ON ALGORITHMS, 2010, 6 (04)
[7]  
Gartner, 2011, GARTN SAYS SOLV BIG
[8]  
Hall M., 2009, SIGKDD Explor Newsl, V11, P10, DOI DOI 10.1145/1656274.1656278
[9]  
Kumar V., 2010, Proc. EDBT, P657, DOI DOI 10.1145/1739041.1739120
[10]  
Neumeyer L., 2010, Proceedings 2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010), P170, DOI 10.1109/ICDMW.2010.172