ffwd: delegation is (much) faster than you think

被引：38

作者：

Roghanchi, Sepideh ^{[1
]}

Eriksson, Jakob ^{[1
]}

Basu, Nilanjana ^{[1
]}

机构：

[1] Univ Illinois, Chicago, IL 60607 USA

来源：

PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17) | 2017年

基金：

美国国家科学基金会;

关键词：

QUEUE; SYNCHRONIZATION; PERFORMANCE; LOCKS;

D O I：

10.1145/3132747.3132771

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We revisit the question of delegation vs. synchronized access to shared memory, and show through analysis and demonstration that delegation can be much faster than locking under a range of common circumstances. Starting from first principles, we propose fast, fly-weight delegation (ffwd). The highly optimized design of ffwd allows it to significantly outperform prior work on delegation, while retaining the scalability advantage. In experiments with 6 benchmark applications, and 6 shared data structures, running on four different multi-socket systems with up to 128 hardware threads, we compare ffwd to a selection of lock, combining, lock-free, software transactional memory and delegation designs. Overall, we find that ffwd often offers a simple and highly competitive alternative to existing work. By definition, the performance of a fully delegated data structure is limited by the single-thread throughput of said data structure. However, due to cache effects, many data structures offer their best performance when confined to a single thread. With an efficient delegation mechanism, we approach this single-threaded performance in a multi-threaded setting. In application-level benchmarks, we see improvements up to 100% over the next best solution tested (RCL), and multiple micro-benchmarks show improvements in the 5-10x range.

引用

页码：342 / 358

页数：17

共 86 条

[1] Abellan J. L., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P893, DOI 10.1109/IPDPS.2011.87
[2] Provably Good Scheduling for Parallel Programs that Use Data Structures through Implicit Batching
Agrawal, Kunal
Fineman, Jeremy T.
Lu, Kefu
Sheridan, Brendan
Sukha, Jim
Utterback, Robert
[J]. PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 84 - 95
[3] Al Bahra S., 2013, Queue, V11, P40, DOI DOI 10.1145/2488364.2492433
[4] Anderson T. E., 1990, IEEE Transactions on Parallel and Distributed Systems, V1, P6, DOI 10.1109/71.80120
[5] [Anonymous], 2001, DISC
[6] [Anonymous], 2004, Technical Report
[7] Concurrent Updates with RCU: Search Tree as an Example
Arbel, Maya
Attiya, Hagit
[J]. PROCEEDINGS OF THE 2014 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC'14), 2014, : 196 - 205
[8] Baumann A, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P29
[9] Boguslavsky L., 2012, J PARALLEL DISTRIBUT, V21, P246
[10] A parallel priority queue with constant time operations
Brodal, GS
Traff, JL
Zaroliagis, CD
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1998, 49 (01) : 4 - 21

← 1 2 3 4 5 6 7 8 9 →