Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow

被引:1
|
作者
Peltenburg, Johan [1 ]
van Straten, Jeroen [1 ]
Brobbel, Matthijs [1 ]
Al-Ars, Zaid [1 ]
Hofstee, H. Peter [1 ,2 ]
机构
[1] Delft Univ Technol, Delft, Netherlands
[2] IBM Corp, Austin, TX USA
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2021年 / 93卷 / 05期
关键词
FPGA; Accelerator; Big data; Analytics; Fletcher; Apache Arrow;
D O I
10.1007/s11265-021-01650-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As big data analytics systems are squeezing out the last bits of performance of CPUs and GPUs, the next near-term and widely available alternative industry is considering for higher performance in the data center and cloud is the FPGA accelerator. We discuss several challenges a developer has to face when designing and integrating FPGA accelerators for big data analytics pipelines. On the software side, we observe complex run-time systems, hardware-unfriendly in-memory layouts of data sets, and (de)serialization overhead. On the hardware side, we observe a relative lack of platform-agnostic open-source tooling, a high design effort for data structure-specific interfaces, and a high design effort for infrastructure. The open source Fletcher framework addresses these challenges. It is built on top of Apache Arrow, which provides a common, hardware-friendly in-memory format to allow zero-copy communication of large tabular data, preventing (de)serialization overhead. Fletcher adds FPGA accelerators to the list of over eleven supported software languages. To deal with the hardware challenges, we present Arrow-specific components, providing easy-to-use, high-performance interfaces to accelerated kernels. The components are combined based on a generic architecture that is specialized according to the application through an extensive infrastructure generation framework that is presented in this article. All generated hardware is vendor-agnostic, and software drivers add a platform-agnostic layer, allowing users to create portable implementations.
引用
收藏
页码:565 / 586
页数:22
相关论文
共 50 条
  • [31] SparkFlow: Towards High-Performance Data Analytics for Spark-based Genome Analysis
    Filgueira, Rosa
    Awaysheh, Feras M.
    Carter, Adam
    White, Darren J.
    Rana, Omer
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 1007 - 1016
  • [32] A Coflow-based Co-optimization Framework for High-performance Data Analytics
    Cheng, Long
    Wang, Ying
    Pei, Yulong
    Epema, Dick
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 392 - 401
  • [33] Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation
    Ajeet Ram Pathak
    Manjusha Pandey
    Siddharth S. Rautaray
    Cluster Computing, 2020, 23 : 953 - 988
  • [34] Concurrent Bandwidth Reservation Strategies for Big Data Transfers in High-Performance Networks
    Zuo, Liudong
    Zhu, Michelle M.
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2015, 12 (02): : 232 - 247
  • [35] Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
    Ahmad, Awais
    Paul, Anand
    Din, Sadia
    Rathore, M. Mazhar
    Choi, Gyu Sang
    Jeon, Gwanggil
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (03) : 508 - 527
  • [36] Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
    Awais Ahmad
    Anand Paul
    Sadia Din
    M. Mazhar Rathore
    Gyu Sang Choi
    Gwanggil Jeon
    International Journal of Parallel Programming, 2018, 46 : 508 - 527
  • [37] State of the Art High-Performance and High-Throughput Computing for Remote Sensing Big Data
    Zhang, Sheng
    Xue, Yong
    Zhou, Xiran
    Zhang, Xiaopeng
    Liu, Wenhao
    Li, Kaiyuan
    Liu, Runze
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2022, 10 (04) : 125 - 149
  • [38] Big Data Analytics: Performance Evaluation for High Availability and Fault Tolerance using MapReduce Framework with HDFS
    Verma, Jai Prakash
    Mankad, Sapan H.
    Garg, Sanjay
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 770 - 775
  • [39] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Patel, Om Prakash
    Pulakitha, Rapolu
    Chauhan, Aditi
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
  • [40] High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
    Simonyan, Vahan
    Mazumder, Raja
    GENES, 2014, 5 (04) : 957 - 981