Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

被引：121

作者：

Qasaimeh, Murad ^{[1
]}

Denolf, Kristof ^{[2
]}

Lo, Jack ^{[2
]}

Vissers, Kees ^{[2
]}

Zambreno, Joseph ^{[1
]}

Jones, Phillip H. ^{[1
]}

机构：

[1] Iowa State Univ, Ames, IA 50011 USA

[2] Xilinx Res Labs, San Jose, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) | 2019年

关键词：

Embedded Vision; GPUs; FPGAs; CPUs; Energy Efficiency;

D O I：

10.1109/icess.2019.8782524

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.

引用

页数：8

共 15 条

[1]

[Anonymous], 2013, THESIS

[2]

Boppana V., 2015, 2015 IEEE Hot Chips 27 Symposium (HCS), P1, DOI DOI 10.1109/HOTCHIPS.2015.7477457

[3]

Bradski G., 2018, OPEN SOURCE COMPUTER

[4]

Brugger C, 2015, ISCAIE 2015 - 2015 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS, P201, DOI 10.1109/ISCAIE.2015.7298356

[5] Accelerating compute-intensive applications with GPUs and FPGAs [J].

Che, Shuai ;

Li, Jie ;

Sheaffer, Jeremy W. ;

Skadron, Kevin ;

Lach, John .

2008 SYMPOSIUM ON APPLICATION SPECIFIC PROCESSORS, 2008, :101-+

[6]

Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797

[7]

Collange S, 2009, LECT NOTES COMPUT SC, V5544, P914, DOI 10.1007/978-3-642-01970-8_92

[8] Understanding Performance Differences of FPGAs and GPUs [J].

Cong, Jason ;

Fang, Zhenman ;

Lo, Michael ;

Wang, Hanrui ;

Xu, Jingxian ;

Zhang, Shaochong .

PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, :93-96

[9] A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications [J].

Cooke, Patrick ;

Fowers, Jeremy ;

Brown, Greg ;

Stitt, Greg .

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2015, 8 (01)

[10]

Fowers J, 2012, FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, P47

← 1 2 →