Reducing I/O Cost in OLAP Query Processing with MapReduce

被引:0
作者
Kang, Woo-Lam [1 ]
Kim, Hyeon-Gyu [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[2] Sahmyook Univ, Dept Comp Engn, Seoul 139742, South Korea
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2015年 / E98D卷 / 02期
关键词
MapReduce; Hadoop; OLAP; data warehouse; TPC-H benchmark;
D O I
10.1587/transinf.2014EDL8143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a method to reduce I/O cost in MapReduce when online analytical processing (OLAP) queries are used for data analysis. The proposed method consists of two basic ideas. First, to reduce network transmission cost, mappers are organized to receive only data necessary to perform a map task, not an entire set of input data. Second, to reduce storage consumption, only record IDs are stored for checkpointing, not the raw records. Experiments conducted with TPC-H benchmark show that the proposed method is about 40% faster than Hive, the well-known data warehouse solution for MapReduce, while reducing the size of data stored for checkpoining to about 80%.
引用
收藏
页码:444 / 447
页数:4
相关论文
共 8 条
  • [1] Adduci R, 2011, TECHNICAL REPORT
  • [2] Anderson Eric, 2010, Operating Systems Review, V44, P40, DOI 10.1145/1740390.1740400
  • [3] Parallel Data Processing with MapReduce: A Survey
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    Choi, Hyunsik
    Chung, Yon Dohn
    Moon, Bongki
    [J]. SIGMOD RECORD, 2011, 40 (04) : 11 - 20
  • [4] From Databases to Big Data
    Madden, Sam
    [J]. IEEE INTERNET COMPUTING, 2012, 16 (03) : 4 - 6
  • [5] Pavlo A, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P165
  • [6] Poess M, 2000, SIGMOD RECORD, V29, P64, DOI 10.1145/369275.369291
  • [7] Shvachko K., 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), P1
  • [8] Hive - A Petabyte Scale Data Warehouse Using Hadoop
    Thusoo, Ashish
    Sen Sarma, Joydeep
    Jain, Namit
    Shao, Zheng
    Chakka, Prasad
    Zhang, Ning
    Antony, Suresh
    Liu, Hao
    Murthy, Raghotham
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 996 - 1005