Well-Structured Futures and Cache Locality

被引:0
作者
Herlihy, Maurice [1 ]
Liu, Zhiyu [1 ]
机构
[1] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
关键词
scheduling; work stealing; futures; parallel programming; cache locality; performance bounds; PARALLELISM;
D O I
10.1145/2692916.2555257
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Omega(PT infinity + tT(infinity)) deviations, which implies Omega(CPT infinity + CtT(infinity)) additional cache misses, where C is the number of cache lines, P is the number of processors, t is the number of touches, and T-infinity is the computation span. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a later descendant of the thread that created it, then parallel executions with work stealing can incur at most O(CPT infinity 2) additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications.
引用
收藏
页码:155 / 166
页数:12
相关论文
共 22 条
  • [11] Burton F.W., 1981, P 1981 C FUNCTIONAL, P187
  • [12] Chase David, 2005, P 17 ANN ACM S PAR A, P21
  • [13] Implicitly-threaded parallelism in Manticore
    Fluet, Matthew
    Rainey, Mike
    Reppy, John
    Shaw, Adam
    [J]. ACM SIGPLAN NOTICES, 2008, 43 (09) : 119 - 130
  • [14] The implementation of the Cilk-5 multithreaded language
    Frigo, M
    Leiserson, CE
    Randall, KH
    [J]. ACM SIGPLAN NOTICES, 1998, 33 (05) : 212 - 223
  • [15] FastForward for Efficient Pipeline Parallelism A Cache-Optimized Concurrent Lock-Free Queue
    Giacomoni, John
    Moseley, Tipp
    Vachharajani, Manish
    [J]. PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 43 - 52
  • [16] Gordon MI, 2006, ACM SIGPLAN NOTICES, V41, P151, DOI 10.1145/1168919.1168877
  • [17] MULTILISP - A LANGUAGE FOR CONCURRENT SYMBOLIC COMPUTATION
    HALSTEAD, RH
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1985, 7 (04): : 501 - 538
  • [18] Halstead Robert H., 1984, P 1984 ACM S LISP FU, P9
  • [19] KRANZ DA, 1989, SIGPLAN NOTICES, V24, P81, DOI 10.1145/74818.74825
  • [20] Lee I.-T.A., 2013, P 25 ACM S PARALLELI, P140, DOI DOI 10.1145/2486159.2486174