P4GPU: Accelerate Packet Processing of a P4 Program with a CPU-GPU Heterogeneous Architecture

被引：15

作者：

Li, Peilong ^{[1
]}

Luo, Yan ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Elect & Comp Engn, Lowell, MA 01854 USA

来源：

PROCEEDINGS OF THE 2016 SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'16) | 2016年

基金：

美国国家科学基金会;

关键词：

GPU; Heterogeneous; Packet Processing; P4;

D O I：

10.1145/2881025.2889480

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The P4 language is an emerging domain-specific language for describing the data plane processing at a network device. P4 has been mapped to a wide range of forwarding devices including NPUs, programmable NICs and FPGAs, except for General Purpose Graphics Processing Unit (GPGPU) which is a salient parallel architecture for processing network flows. In this work, we design a heterogeneous architecture with both CPU and GPU as a P4 programming target, and present a toolset to map a P4 program onto the proposed architecture. Our evaluation reveals that a P4 program can render promising performance on such architecture by parallelizing its "match+action" engine with the GPGPU accelerator. The experiment results show that the auto-configured GPU kernels achieve scalable lookup and classification speeds: the prototype system can reach up to 580 Gbps for IP lookups (64-byte packets) and 60 million classifications per second for 4k firewall rules, respectively.

引用

页码：125 / 126

页数：2