A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors

被引:3
作者
Micolet, Paul-Jules [1 ]
Smith, Aaron [1 ,2 ]
Dubach, Christophe [1 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9YL, Midlothian, Scotland
[2] Microsoft Res, Redmond, WA USA
关键词
Machine Learning; Dynamic Multicore Processor; Streaming Programming Languages; EXECUTION;
D O I
10.1145/2907950.2907951
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the mobile landscape. In recent years a shift has occurred towards heterogeneity (e.g. ARM big. LITTLE) and reconfigurability. Dynamic Multicore Processors (DMPs) bridge the gap between fully reconfigurable processors and homogeneous multicore systems. They can re-allocate their resources at runtime to create larger more powerful logical processors fine-tuned to the workload. Unfortunately, there exists no accurate method to determine how to partition the cores in a DMP among application threads. Often programmers rely on analyzing the application manually and using a set of hand picked heuristics. This leads to sub-optimal performance, reducing the potential of DMPs. What is needed is a way to determine the optimal partitioning and grouping of resources to maximize performance. As a first step, this paper studies the effect of thread partitioning and hardware resource allocation on a set of StreamIt applications. We show that the resulting space is not trivial and exhibits a large performance variation depending on the combination of parameters. We introduce a machine-learning based methodology to tackle the space complexity. Our machine-learning model is able to directly predict the best combination of parameters using static code features. The predicted set of parameters leads to performance on-par with the best performance found in a space of more than 32,000 configurations per application.
引用
收藏
页码:113 / 122
页数:10
相关论文
共 25 条
[1]  
[Anonymous], P INT C COMP ARCH SY
[2]  
[Anonymous], 2003, CITESEERX
[3]  
Auerbach J, 2012, DES AUT CON, P271
[4]  
Bell S., 2008, P 2008 IEEE INT SOL, DOI DOI 10.1109/ISSCC.2008.4523070
[5]   The impact of dynamically heterogeneous multicore processors on thread scheduling [J].
Bower, Fred A. ;
Sorin, Daniel J. ;
Cox, Landon P. .
IEEE MICRO, 2008, 28 (03) :17-25
[6]   Brook for GPUs: Stream computing on graphics hardware [J].
Buck, I ;
Foley, T ;
Horn, D ;
Sugerman, J ;
Fatahalian, K ;
Houston, M ;
Hanrahan, P .
ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03) :777-786
[7]  
Chen Jiawen., 2005, Workshop on Graphics Hardware, P71
[8]  
Eyerman S, 2010, CONF PROC INT SYMP C, P362, DOI 10.1145/1816038.1816011
[9]   Profile-Guided Deployment of Stream Programs on Multicores [J].
Farhad, S. M. ;
Ko, Yousun ;
Burgstaller, Bernd ;
Scholz, Bernhard .
ACM SIGPLAN NOTICES, 2012, 47 (05) :79-88
[10]   A stream compiler for communication-exposed architectures [J].
Gordon, MI ;
Thies, W ;
Karczmarek, M ;
Lin, J ;
Meli, AS ;
Lamb, AA ;
Leger, C ;
Wong, J ;
Hoffmann, H ;
Maze, D ;
Amarasinghe, S .
ACM SIGPLAN NOTICES, 2002, 37 (10) :291-303