ffwd: delegation is (much) faster than you think

被引:38
作者
Roghanchi, Sepideh [1 ]
Eriksson, Jakob [1 ]
Basu, Nilanjana [1 ]
机构
[1] Univ Illinois, Chicago, IL 60607 USA
来源
PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17) | 2017年
基金
美国国家科学基金会;
关键词
QUEUE; SYNCHRONIZATION; PERFORMANCE; LOCKS;
D O I
10.1145/3132747.3132771
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We revisit the question of delegation vs. synchronized access to shared memory, and show through analysis and demonstration that delegation can be much faster than locking under a range of common circumstances. Starting from first principles, we propose fast, fly-weight delegation (ffwd). The highly optimized design of ffwd allows it to significantly outperform prior work on delegation, while retaining the scalability advantage. In experiments with 6 benchmark applications, and 6 shared data structures, running on four different multi-socket systems with up to 128 hardware threads, we compare ffwd to a selection of lock, combining, lock-free, software transactional memory and delegation designs. Overall, we find that ffwd often offers a simple and highly competitive alternative to existing work. By definition, the performance of a fully delegated data structure is limited by the single-thread throughput of said data structure. However, due to cache effects, many data structures offer their best performance when confined to a single thread. With an efficient delegation mechanism, we approach this single-threaded performance in a multi-threaded setting. In application-level benchmarks, we see improvements up to 100% over the next best solution tested (RCL), and multiple micro-benchmarks show improvements in the 5-10x range.
引用
收藏
页码:342 / 358
页数:17
相关论文
共 86 条
  • [1] Abellan J. L., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P893, DOI 10.1109/IPDPS.2011.87
  • [2] Provably Good Scheduling for Parallel Programs that Use Data Structures through Implicit Batching
    Agrawal, Kunal
    Fineman, Jeremy T.
    Lu, Kefu
    Sheridan, Brendan
    Sukha, Jim
    Utterback, Robert
    [J]. PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 84 - 95
  • [3] Al Bahra S., 2013, Queue, V11, P40, DOI DOI 10.1145/2488364.2492433
  • [4] Anderson T. E., 1990, IEEE Transactions on Parallel and Distributed Systems, V1, P6, DOI 10.1109/71.80120
  • [5] [Anonymous], 2001, DISC
  • [6] [Anonymous], 2004, Technical Report
  • [7] Concurrent Updates with RCU: Search Tree as an Example
    Arbel, Maya
    Attiya, Hagit
    [J]. PROCEEDINGS OF THE 2014 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC'14), 2014, : 196 - 205
  • [8] Baumann A, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P29
  • [9] Boguslavsky L., 2012, J PARALLEL DISTRIBUT, V21, P246
  • [10] A parallel priority queue with constant time operations
    Brodal, GS
    Traff, JL
    Zaroliagis, CD
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1998, 49 (01) : 4 - 21