Scalable Many-field Packet Classification on Multi-core Processors

被引：21

作者：

Qu, Yun R. ^{[1
]}

Zhou, Shijie ^{[1
]}

Prasanna, Viktor K. ^{[1
]}

机构：

[1] Univ So Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90007 USA

来源：

2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD) | 2013年

关键词：

packet classification; multi-core; performance;

D O I：

10.1109/SBAC-PAD.2013.29

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Packet classification matches a packet header against the predefined rules in a rule set; it is a kernel function that has been studied for decades. A recent trend in packet classification is to match a large number of packet header fields. For example, the flow table lookup in Software Defined Networking (SDN) requires 15 fields of the packet header to be examined. Another trend in packet classification is to use software-based solutions employing multi-core general purpose processors and virtual machines. Although packet classification has been widely studied, most existing solutions on multi-core systems target the classic 5-field packet classification; their performance cannot be easily scaled up for a larger number of packet header fields. In this paper, we propose a decomposition-based packet classification approach; it supports large rule sets consisting of a large number of packet header fields. We first use range-tree and hashing to search each field of the input packet header individually in parallel. The partial results from all the fields are represented by bit vectors; they are merged in parallel to produce the final packet header match. We also balance the search and merge latencies, and employ software pipelining to further enhance the overall performance. We implement our approach on state-of-the-art multi-core processors; we evaluate its performance with respect to throughput and latency for rule set size ranging from 1K to 32K. Experimental results show that, for a 32K rule set, our algorithms can achieve an average processing latency of 2000 ns per packet and an overall throughput of 30 million packets per second on a state-of-the-art 16-core platform.

引用

页码：33 / 40

页数：8

共 22 条

[1]

[Anonymous], 2005, FPGA 05

[2]

Baboescu F, 2003, IEEE INFOCOM SER, P53

[3]

Brebner G, 2012, PROCEEDINGS OF THE EIGHTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'12), P1

[4]

Dharmapurikar Sarang, 2006, ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2006), P61, DOI 10.1109/ANCS.2006.4579524

[5]

Ganegedara T., 2012, 2012 IEEE 13th International Conference on High Performance Switching and Routing (HPSR), P1, DOI 10.1109/HPSR.2012.6260820

[6] Algorithms for packet classification [J].

Gupta, P ;

McKeown, N .

IEEE NETWORK, 2001, 15 (02) :24-32

[7] Classifying packets with hierarchical intelligent cuttings [J].

Gupta, P ;

McKeown, N .

IEEE MICRO, 2000, 20 (01) :34-41

[8] Scalable Packet Classification on FPGA [J].

Jiang, Weirong ;

Prasanna, Viktor K. .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (09) :1668-1680

[9]

Koponen T, 2012, PROCEEDINGS OF THE EIGHTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'12), P135

[10]

Lakshman T. V., 1998, Computer Communication Review, V28, P203, DOI 10.1145/285243.285283

← 1 2 3 →