Efficient Big Data Processing in Hadoop MapReduce

被引:125
作者
Dittrich, Jens [1 ,2 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
机构
[1] Saarland Univ, Informat Syst Grp, Saarbrucken, Germany
[2] Saarland Univ, Comp Sci Databases, Saarbrucken, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 12期
关键词
26;
D O I
10.14778/2367502.2367562
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.
引用
收藏
页码:2014 / 2015
页数:2
相关论文
共 25 条
  • [1] Column-oriented Database Systems
    Abadi, Daniel J.
    Boncz, Peter A.
    Harizopoulos, Stavros
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1664 - 1665
  • [2] Afrati F.N., 2010, P 13 INT C EXTENDING, P99, DOI DOI 10.1145/1739041.1739056
  • [3] Babu S., 2010, P 1 ACM S CLOUD COMP, P137, DOI DOI 10.1145/1807128.1807150
  • [4] Blanas S., 2010, P 2010 ACM SIGMOD IN, P975, DOI DOI 10.1145/1807167.1807273
  • [5] MapReduce: A Flexible Data Processing Tool
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2010, 53 (01) : 72 - 77
  • [6] Dittrich J., 2012, PVLDB, P5
  • [7] Dittrich J., 2010, PROC VLDB ENDOW, V3, P519
  • [8] Column-Oriented Storage Techniques for MapReduce
    Floratou, Avrilia
    Patel, Jignesh M.
    Shekita, Eugene J.
    Tata, Sandeep
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (07): : 419 - 429
  • [9] Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
    Gates, Alan F.
    Natkovich, Olga
    Chopra, Shubham
    Kamath, Pradeep
    Narayanamurthy, Shravan M.
    Olston, Christopher
    Reed, Benjamin
    Srinivasan, Santhosh
    Srivastava, Utkarsh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1414 - 1425
  • [10] Ghemawat S., 2003, P 19 ACM S OPERATING, P29, DOI [DOI 10.1145/1165389.945450, 10.1145/1165389.945450]