Memory Management in NUMA Multicore Systems: Trapped between Cache Contention and Interconnect Overhead

被引:55
作者
Majo, Zoltan [1 ]
Gross, Thomas R. [1 ]
机构
[1] ETH, Dept Comp Sci, Zurich, Switzerland
关键词
Performance; Algorithms; Experimentation; NUMA; multicore processors; shared resource contention; memory allocation;
D O I
10.1145/2076022.1993481
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform memory access times. As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. We find that optimizing only for data locality can counteract the benefits of cache contention avoidance and vice versa. Therefore, system software must take both data locality and cache contention into account to achieve good performance, and memory management cannot be decoupled from process scheduling. We present a detailed analysis of a commercially available NUMA-multicore architecture, the Intel Nehalem. We describe two scheduling algorithms: maximum-local, which optimizes for maximum data locality, and its extension, N-MASS, which reduces data locality to avoid the performance degradation caused by cache contention. N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations.
引用
收藏
页码:11 / 32
页数:22
相关论文
共 16 条
[1]  
Anderson TA, 2010, ACM SIGPLAN NOTICES, V45, P21
[2]  
BACON DF, 2003, P 30 ACM SIGPLAN SIG, P285
[3]  
Bergstrom Lars., 2010, Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP '10, P93
[4]  
BLACKBURN SM, 2004, P ACM C MEAS MOD COM, P25
[5]   Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance [J].
Blackburn, Stephen M. ;
McKinley, Kathryn S. .
PLDI'08: PROCEEDINGS OF THE 2008 SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN & IMPLEMENTATION, 2008, :22-+
[6]  
DOLIGEZ D, 1993, POPL 93, P113
[7]  
DOMANI T, 2002, ISMM 02, P76
[8]  
Fluet Matthew., 2008, Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, ICFP'08, P119
[9]   A fast analysis for thread-local garbage collection with dynamic class loading [J].
Jones, R ;
King, AC .
Fifth IEEE International Workshop on Source Code Analysis and Manipulation, Proceedings, 2005, :129-138
[10]  
Marlow S, 2004, HASKELL 04, P22, DOI DOI 10.1145/1017472.1017479