Optimizing Near-Data Processing for Spark

被引:1
|
作者
Rachuri, Sri Pramodh [1 ]
Gantasala, Arun [1 ]
Emanuel, Prajeeth [1 ]
Gandhi, Anshul [1 ]
Foley, Robert [2 ]
Puhov, Peter [2 ]
Gkountouvas, Theodoros [3 ]
Lei, Hui [3 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] FutureWei, Santa Clara, CA USA
[3] OpenInfra Labs, London, England
来源
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022) | 2022年
基金
美国国家科学基金会;
关键词
resource disaggregation; near-data processing; spark; pushdown; modeling;
D O I
10.1109/ICDCS54860.2022.00067
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Resource disaggregation (RD) is an emerging paradigm for data center computing whereby resource-optimized servers are employed to minimize resource fragmentation and improve resource utilization. Apache Spark deployed under the RD paradigm employs a cluster of compute-optimized servers to run executors and a cluster of storage-optimized servers to host the data on HDFS. However, the network transfer from storage to compute cluster becomes a severe bottleneck for big data processing. Near-data processing (NDP) is a concept that aims to alleviate network load in such cases by offloading (or "pushing down") some of the compute tasks to the storage cluster. Employing NDP for Spark under the RD paradigm is challenging because storage-optimized servers have limited computational resources and cannot host the entire Spark processing stack. Further, even if such a lightweight stack could be developed and deployed on the storage cluster, it is not entirely obvious which Spark queries would benefit from pushdown, and which tasks of a given query should be pushed down to storage. This paper presents the design and implementation of a near-data processing system for Spark, SparkNDP, that aims to address the aforementioned challenges. SparkNDP works by implementing novel NDP Spark capabilities on the storage cluster using a lightweight library of SQL operators and then developing an analytical model to help determine which Spark tasks should be pushed down to storage based on the current network and system state. Simulation and prototype implementation results show that SparkNDP can help reduce Spark query execution times when compared to both the default approach of not pushing down any tasks to storage and the outright NDP approach of pushing all tasks to storage.
引用
收藏
页码:636 / 646
页数:11
相关论文
共 50 条
  • [21] Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs
    Kim, Gwangsun
    Chatterjee, Niladrish
    O'Connor, Mike
    Hsieh, Kevin
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [22] Practical Near-Data Processing for In-memory Analytics Frameworks
    Gao, Mingyu
    Ayers, Grant
    Kozyrakis, Christos
    2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124
  • [23] Optimistic Regular Expression Matching on FPGAs for Near-Data Processing
    Becher, Andreas
    Wildermann, Stefan
    Teich, Juergen
    14TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2018), 2018,
  • [24] Active-Routing: Compute on the Way for Near-Data Processing
    Huang, Jiayi
    Puli, Ramprakash Reddy
    Majumder, Pritam
    Kim, Sungkeun
    Boyapati, Rahul
    Yum, Ki Hwan
    Kim, Eun Jung
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 674 - 686
  • [25] NATSA: A Near-Data Processing Accelerator for Time Series Analysis
    Fernandez, Ivan
    Quislant, Ricardo
    Giannoula, Christina
    Alser, Mohammed
    Gomez-Luna, Juan
    Gutierrez, Eladio
    Plata, Oscar
    Mutlu, Onur
    2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 120 - 129
  • [26] Exploiting Near-Data Processing to Accelerate Time Series Analysis
    Fernandez, Ivan
    Quislant, Ricardo
    Giannoula, Christina
    Alser, Mohammed
    Gomez-Luna, Juan
    Gutierrez, Eladio
    Plata, Oscar
    Mutlu, Onur
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 279 - 282
  • [27] A Near-Data Processing Server Architecture and Its Impact on Data Center Applications
    Song, Xiaojia
    Xie, Tao
    Fischer, Stephen
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019, 2019, 11501 : 81 - 98
  • [28] Accelerating Linked-list Traversal Through Near-Data Processing
    Hong, Byungchul
    Kim, Gwangsun
    Ahn, Jung Ho
    Kwon, Yongkee
    Kim, Hongsik
    Kim, John
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 113 - 124
  • [29] Video Decoder Improvements with Near-Data Speculative Motion Compensation Processing
    de Souza, Garrenlus
    Azambuja, Jose Rodrigo
    Zatt, Bruno
    Zanata, Marco A.
    Bampi, Sergio
    Sampaio, Felipe
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 399 - 403
  • [30] NearPM: A Near-Data Processing System for Storage-Class Applications
    Seneviratne, Yasas
    Seemakhupt, Korakit
    Liu, Sihang
    Khan, Samira
    PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 751 - 767