Improving the I/O Performance in the Reduce Phase of Hadoop

被引:14
作者
Fujishima, Eita [1 ]
Yamaguchi, Saneyasu [1 ]
机构
[1] Kogakuin Univ, Elect Engn & Elect, Grad Sch, Tokyo, Japan
来源
PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) | 2015年
关键词
MapReduce; Hadoop; filesystem;
D O I
10.1109/CANDAR.2015.24
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is a popular open- source MapReduce implementation. In the cases of jobs wherein all the output files of all the relevant Map tasks are transmitted and consolidated into a single Reduce task, such as in TeraSort, the single Reduce task is the bottleneck task and is I/O bounded for processing many large output files. In most cases, including TeraSort, the intermediate data, which include the output files of the Map tasks, are large and accessed sequentially. For improving the performance of these jobs, it is important to increase the sequential access performance. In this paper, we focus on Hadoop sample job TeraSort, which is a single- Reduce- tasked job, and discuss a method for improving its performance. First, we perform TeraSort and demonstrate that the single Reduce task is the bottleneck task and is I/O bounded. Second, we show the sequential I/O speed of each zone of an HDD. Third, we propose a method for improving the performance of such single- Reducetasked jobs. The proposed method controls block bitmaps of the filesystem and stores the intermediate files in a faster zone, i. e., the outer range, of the HDD. Lastly, we present performance evaluation with HDFS block sizes of 64 MB and 128 MB and demonstrate that our method improves the performance.
引用
收藏
页码:82 / 88
页数:7
相关论文
共 9 条
  • [1] [Anonymous], 2004, OSDI 04 6 S OP SYST
  • [2] [Anonymous], 2012, PROC 9 USENIX C NETW
  • [3] [Anonymous], 2011, P 2011 ACM SIGMOD IN
  • [4] Axboe Jens, 2004, OTT LIN S, P51
  • [5] Iyer Sitaram, 2001, P 18 ACM S OP SYST P
  • [6] Melnik S, 2010, PROC VLDB ENDOW, V3, P330
  • [7] Nakamura Yuta, 2014, 8 INT C UB INF MAN C
  • [8] Ozawa T., 2013, IPSJ J ADV COMPUTER, V43
  • [9] Yamada Masaya, 2012, 9 IEEE INT C AUT TRU