Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

被引:3
|
作者
Nicolae, Bogdan [1 ]
Costa, Carlos [2 ]
Misale, Claudia [2 ]
Katrinis, Kostas [1 ]
Park, Yoonho [2 ]
机构
[1] IBM Res, Dublin, Ireland
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
来源
2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2016年
关键词
big data analytics; data shuffling; memory-efficient I/O; elastic buffering; MAPREDUCE;
D O I
10.1109/CCGrid.2016.85
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub-optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.
引用
收藏
页码:409 / 412
页数:4
相关论文
共 50 条
  • [1] Memory-optimized distributed utility mining for big data
    Kumar, Sunil
    Mohbey, Krishna Kumar
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6491 - 6503
  • [2] Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics
    Nicolae, Bogdan
    Costa, Carlos H. A.
    Misale, Claudia
    Katrinis, Kostas
    Park, Yoonho
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1663 - 1674
  • [3] Analytics towards big data
    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing
    100876, China
    不详
    100876, China
    不详
    100876, China
    Beijing Youdian Daxue Xuebao, 3 (1-12):
  • [4] Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 353 - 356
  • [5] Towards Streamlined Big Data Analytics
    Benczur, Andras A.
    Palovics, Robert
    Balassi, Marton
    Markl, Volker
    Rabl, Tilmann
    Soto, Juan
    Hovstadius, Bjorn
    Dowling, Jim
    Haridi, Seif
    ERCIM NEWS, 2016, (107): : 31 - 32
  • [6] Visual analytics towards big data
    Ren, Lei
    Du, Yi
    Ma, Shuai
    Zhang, Xiao-Long
    Dai, Guo-Zhong
    Ruan Jian Xue Bao/Journal of Software, 2014, 25 (09): : 1909 - 1936
  • [7] Towards Efficient Big Data and Data Analytics: A Review
    Qureshi, Salim Raza
    Gupta, Ankur
    2014 CONFERENCE ON IT IN BUSINESS, INDUSTRY AND GOVERNMENT (CSIBIG), 2014,
  • [8] Data-Less Big Data Analytics (Towards Intelligent Data Analytics Systems)
    Triantafillou, Peter
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1666 - 1667
  • [9] Memory-Optimized Tile Based Data Structure for Adaptive Mesh Refinement
    Ivanov, Anton
    Perepelkina, Anastasia
    Levchenko, Vadim
    Pershin, Ilya
    SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 64 - 74
  • [10] Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database
    Eldawy, Ahmed
    Levandoski, Justin
    Larson, Per-Ake
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (11): : 931 - 942