Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

被引：3

作者：

Nicolae, Bogdan ^{[1
]}

Costa, Carlos ^{[2
]}

Misale, Claudia ^{[2
]}

Katrinis, Kostas ^{[1
]}

Park, Yoonho ^{[2
]}

机构：

[1] IBM Res, Dublin, Ireland

[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA

来源：

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2016年

关键词：

big data analytics; data shuffling; memory-efficient I/O; elastic buffering; MAPREDUCE;

D O I：

10.1109/CCGrid.2016.85

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub-optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.

引用

页码：409 / 412

页数：4

共 50 条

[1] Memory-optimized distributed utility mining for big data
Kumar, Sunil
Mohbey, Krishna Kumar
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6491 - 6503
[2] Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics
Nicolae, Bogdan
Costa, Carlos H. A.
Misale, Claudia
Katrinis, Kostas
Park, Yoonho
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1663 - 1674
[3] Analytics towards big data
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing
100876, China
不详
100876, China
不详
100876, China
Beijing Youdian Daxue Xuebao, 3 (1-12):
[4] Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters
Koliopoulos, Aris-Kyriakos
Yiapanis, Paraskevas
Tekiner, Firat
Nenadic, Goran
Keane, John
2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 353 - 356
[5] Towards Streamlined Big Data Analytics
Benczur, Andras A.
Palovics, Robert
Balassi, Marton
Markl, Volker
Rabl, Tilmann
Soto, Juan
Hovstadius, Bjorn
Dowling, Jim
Haridi, Seif
ERCIM NEWS, 2016, (107): : 31 - 32
[6] Visual analytics towards big data
Ren, Lei
Du, Yi
Ma, Shuai
Zhang, Xiao-Long
Dai, Guo-Zhong
Ruan Jian Xue Bao/Journal of Software, 2014, 25 (09): : 1909 - 1936
[7] Towards Efficient Big Data and Data Analytics: A Review
Qureshi, Salim Raza
Gupta, Ankur
2014 CONFERENCE ON IT IN BUSINESS, INDUSTRY AND GOVERNMENT (CSIBIG), 2014,
[8] Data-Less Big Data Analytics (Towards Intelligent Data Analytics Systems)
Triantafillou, Peter
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1666 - 1667
[9] Memory-Optimized Tile Based Data Structure for Adaptive Mesh Refinement
Ivanov, Anton
Perepelkina, Anastasia
Levchenko, Vadim
Pershin, Ilya
SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 64 - 74
[10] Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database
Eldawy, Ahmed
Levandoski, Justin
Larson, Per-Ake
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (11): : 931 - 942

← 1 2 3 4 5 →