Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow

被引：1

作者：

Peltenburg, Johan ^{[1
]}

van Straten, Jeroen ^{[1
]}

Brobbel, Matthijs ^{[1
]}

Al-Ars, Zaid ^{[1
]}

Hofstee, H. Peter ^{[1
,2
]}

机构：

[1] Delft Univ Technol, Delft, Netherlands

[2] IBM Corp, Austin, TX USA

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2021年 / 93卷 / 05期

关键词：

FPGA; Accelerator; Big data; Analytics; Fletcher; Apache Arrow;

D O I：

10.1007/s11265-021-01650-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As big data analytics systems are squeezing out the last bits of performance of CPUs and GPUs, the next near-term and widely available alternative industry is considering for higher performance in the data center and cloud is the FPGA accelerator. We discuss several challenges a developer has to face when designing and integrating FPGA accelerators for big data analytics pipelines. On the software side, we observe complex run-time systems, hardware-unfriendly in-memory layouts of data sets, and (de)serialization overhead. On the hardware side, we observe a relative lack of platform-agnostic open-source tooling, a high design effort for data structure-specific interfaces, and a high design effort for infrastructure. The open source Fletcher framework addresses these challenges. It is built on top of Apache Arrow, which provides a common, hardware-friendly in-memory format to allow zero-copy communication of large tabular data, preventing (de)serialization overhead. Fletcher adds FPGA accelerators to the list of over eleven supported software languages. To deal with the hardware challenges, we present Arrow-specific components, providing easy-to-use, high-performance interfaces to accelerated kernels. The components are combined based on a generic architecture that is specialized according to the application through an extensive infrastructure generation framework that is presented in this article. All generated hardware is vendor-agnostic, and software drivers add a platform-agnostic layer, allowing users to create portable implementations.

引用

页码：565 / 586

页数：22

共 50 条

[31] SparkFlow: Towards High-Performance Data Analytics for Spark-based Genome Analysis
Filgueira, Rosa
Awaysheh, Feras M.
Carter, Adam
White, Darren J.
Rana, Omer
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 1007 - 1016
[32] A Coflow-based Co-optimization Framework for High-performance Data Analytics
Cheng, Long
Wang, Ying
Pei, Yulong
Epema, Dick
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 392 - 401
[33] Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation
Ajeet Ram Pathak
Manjusha Pandey
Siddharth S. Rautaray
Cluster Computing, 2020, 23 : 953 - 988
[34] Concurrent Bandwidth Reservation Strategies for Big Data Transfers in High-Performance Networks
Zuo, Liudong
Zhu, Michelle M.
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2015, 12 (02): : 232 - 247
[35] Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
Ahmad, Awais
Paul, Anand
Din, Sadia
Rathore, M. Mazhar
Choi, Gyu Sang
Jeon, Gwanggil
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (03) : 508 - 527
[36] Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
Awais Ahmad
Anand Paul
Sadia Din
M. Mazhar Rathore
Gyu Sang Choi
Gwanggil Jeon
International Journal of Parallel Programming, 2018, 46 : 508 - 527
[37] State of the Art High-Performance and High-Throughput Computing for Remote Sensing Big Data
Zhang, Sheng
Xue, Yong
Zhou, Xiran
Zhang, Xiaopeng
Liu, Wenhao
Li, Kaiyuan
Liu, Runze
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2022, 10 (04) : 125 - 149
[38] Big Data Analytics: Performance Evaluation for High Availability and Fault Tolerance using MapReduce Framework with HDFS
Verma, Jai Prakash
Mankad, Sapan H.
Garg, Sanjay
2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 770 - 775
[39] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
Jha, Preeti
Tiwari, Aruna
Bharill, Neha
Ratnaparkhe, Milind
Patel, Om Prakash
Pulakitha, Rapolu
Chauhan, Aditi
2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
[40] High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
Simonyan, Vahan
Mazumder, Raja
GENES, 2014, 5 (04) : 957 - 981

← 1 2 3 4 5 →