P4GPU: Acceleration of Programmable Data Plane Using a CPU-GPU Heterogeneous Architecture

被引：0

作者：

Li, Peilong ^{[1
]}

Luo, Yan ^{[1
]}

机构：

[1] Univ Massachusetts Lowell, Dept Elect & Comp Engn, Lowell, MA 01852 USA

来源：

2016 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR) | 2016年

关键词：

Programmable Data Plane; Heterogeneous Architecture; Packet Processing; P4; IP LOOKUP;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The programmability of the network data plane has become one of the most desirable features within the context of software defined networks, with P4 serving as a domain-specific language for defining data plane processing. In this work, we are motivated to address the challenges of mapping a P4 defined data plane to a heterogeneous programmable hardware architecture consisting of both a CPU and a GPU, which includes a salient parallel SIMD architecture for processing network flows. We first design a toolset that can be used to map a P4 program onto the proposed architecture. We then optimize the GPU kernel designs for "match-action" primitives and present latency-hiding techniques to reduce the overheads of CPU/GPU communication. In addition, load balancing is investigated to maximize the utilization of CPU and GPU resources. Our toolset and optimizations allow a P4 program to render promising performance on the given heterogeneous architecture. Specifically, the experimental results collected on our prototype systems show that the automatically configured GPU kernels achieve scalable lookup and classification speeds with 420 million IP lookups per second, and more than 60 million classifications per second (for 4K firewall rules).

引用

页码：168 / 175

页数：8