Ad-Hoc Data Processing in the Cloud

被引:26
作者
Logothetis, Dionysios [1 ]
Yocum, Kenneth [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci, La Jolla, CA 92093 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2008年 / 1卷 / 02期
关键词
D O I
10.14778/1454159.1454204
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of the MapReduce abstraction with a wide-scale distributed stream processor, Mortar. While our incremental MapReduce operators avoid data re-processing, the stream processor manages the placement and physical data flow of the operators across the wide area. We demonstrate a distributed web indexing engine against which users can submit and deploy continuous MapReduce jobs. A visualization component illustrates both the incremental indexing and index searches in real time.
引用
收藏
页码:1472 / 1475
页数:4
相关论文
共 10 条
  • [1] Balazinska M., 2005, P ACM SIGMOD BALT MD
  • [2] Boldi P., 2004, SOFTWARE PRACTICE EX
  • [3] Dasdan A., 2007, P ACM SIGMOD JUN 200
  • [4] Dean Jeffrey, 2004, OSDI 04
  • [5] PARALLEL DATABASE-SYSTEMS - THE FUTURE OF HIGH-PERFORMANCE DATABASE-SYSTEMS
    DEWITT, D
    GRAY, J
    [J]. COMMUNICATIONS OF THE ACM, 1992, 35 (06) : 85 - 98
  • [6] Garfinkel S., 2007, TR0807
  • [7] Isard M., 2007, EUR C COMP SYS EUR
  • [8] Logothetis D., 2008, USENIX ANN TECHN C M
  • [9] Palpanas T., 2002, P 28 VLDB SEPT 2002
  • [10] VAHDAT A, 2002, OSDI 02