FirePerf: FPGA-Accelerated Full-System Hardware/Software Performance Profiling and Co-Design

被引:18
作者
Karandikar, Sagar [1 ]
Ou, Albert [1 ]
Amid, Alon [1 ]
Mao, Howard [1 ]
Katz, Randy [1 ]
Nikolic, Borivoje [1 ]
Asanovic, Krste [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV) | 2020年
关键词
performance profiling; hardware/software co-design; FPGA-accelerated simulation; network performance optimization; agile hardware; TIMING SIMULATION; SUPPORT;
D O I
10.1145/3373376.3378455
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving high-performance when developing specialized hardware/software systems requires understanding and improving not only core compute kernels, but also intricate and elusive system-level bottlenecks. Profiling these bottlenecks requires both high-fidelity introspection and the ability to run sufficiently many cycles to execute complex software stacks, a challenging combination. In this work, we enable agile full-system performance optimization for hardware/ software systems with FirePerf, a set of novel out-of-band system-level performance profiling capabilities integrated into the open-source FireSim FPGA-accelerated hardware simulation platform. Using out-of-band call stack reconstruction and automatic performance counter insertion, FirePerf enables introspecting into hardware and software at appropriate abstraction levels to rapidly identify opportunities for software optimization and hardware specialization, without disrupting end-to-end system behavior like traditional profiling tools. We demonstrate the capabilities of FirePerf with a case study that optimizes the hardware/software stack of an open-source RISC-V SoC with an Ethernet NIC to achieve 8x end-to-end improvement in achievable bandwidth for networking applications running on Linux. We also deploy a RISC-V Linux kernel optimization discovered with FirePerf on commercial RISC-V silicon, resulting in up to 1.72x improvement in network performance.
引用
收藏
页码:715 / 731
页数:17
相关论文
共 53 条
[1]   Addressing the challenges of synchronization/communication and debugging support in hardware/software cosimulation [J].
Agrawal, Banit ;
Sherwood, Timothy ;
Shin, Chulho ;
Yoon, Simon .
21ST INTERNATIONAL CONFERENCE ON VLSI DESIGN: HELD JOINTLY WITH THE 7TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2008, :354-+
[2]  
[Anonymous], 2010, SLIDES LINUX K
[3]  
[Anonymous], 2018, KENDRYTE K210 ANNOUN
[4]  
[Anonymous], 2019, STRACE STRACE IS DIA
[5]  
[Anonymous], 2019, NETWORK MAXIMUM TRAN
[6]  
[Anonymous], 2019, FIRESIM EASY TO USE
[7]  
Asanovic K., 2016, technical report ucb/eecs-2016-17
[8]  
Asanovic Krste., 2015, BERKELEY OUT OF ORDE
[9]  
Bachrach J, 2012, DES AUT CON, P1212
[10]  
Barr Jeff, 2018, NEW C5N INSTANCES 10