Exploiting Semantics of Virtual Memory to Improve the Efficiency of the On-Chip Memory System

被引:0
作者
Li, Bin [1 ]
Fang, Zhen [2 ]
Zhao, Li [1 ]
Jiang, Xiaowei [1 ]
Li, Lin [1 ]
Herdrich, Andrew [1 ]
Iyer, Ravishankar [1 ]
Makineni, Srihari [1 ]
机构
[1] Intel Corp, Hillsboro, OR 97124 USA
[2] Nvidia, Austin, TX 78717 USA
来源
EURO-PAR 2012 PARALLEL PROCESSING | 2012年 / 7484卷
关键词
POWER; MODEL;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Different virtual memory regions (e. g., stack and heap) have different properties and characteristics. For example, stack data are thread-private by definition while heap data can be shared between threads. Compared with heap memory, stack memory tends to take a large number of accesses to a rather small number of pages. These facts have been largely ignored by designers. In this paper, we propose two novel designs that exploit stack memory's unique characteristics to optimize the on-chip memory system. The first design is Anticipatory Superpaging - automatically create superpages for stack memory at the first page fault in a potential superpage, increasing TLB reach and reducing TLB misses. It is transparent to applications and does not require kernel to employ online analysis algorithms and page copying. The second design is Stack-Aware Cache Placement - stack accesses are routed to their local slices in a distributed shared cache, while non-stack accesses are still routed using cacheline interleaving. The primary benefit of this mechanism is reduced power consumption of the on-chip interconnect. Our simulation shows that the first innovation reduces TLB misses by 10% - 20%, and the second one reduces interconnect power consumption by over 14%.
引用
收藏
页码:232 / 245
页数:14
相关论文
共 17 条
[1]  
Ballapuram C.S., 2008, ASLPED 2008
[2]  
Cascaval C, 2005, PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, P339
[3]  
Cho S., 2006, MICRO 2006
[4]  
Ganapathy N., 1998, ATEC 1998
[5]   L1 data cache decomposition for energy efficiency [J].
Huang, M ;
Renau, J ;
Yoo, SM ;
Torrellas, J .
ISLPED'01: PROCEEDINGS OF THE 2001 INTERNATIONAL SYMPOSIUM ON LOWPOWER ELECTRONICS AND DESIGN, 2001, :10-15
[6]   SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors [J].
Jin, Lei ;
Cho, Sangyeun .
18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, :361-371
[7]  
Kahng AB, 2009, DES AUT TEST EUROPE, P423
[8]  
Lee HHS, 2003, ISLPED'03: PROCEEDINGS OF THE 2003 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, P306
[9]   Investigating the TLB Behavior of high-end scientific applications on commodity microprocessors [J].
McCurdy, Collin ;
Cox, Alan L. ;
Vetter, Jeffrey .
ISPASS 2008: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2008, :95-+
[10]   Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0 [J].
Muralimanohar, Naveen ;
Balsubramonian, Rajeev ;
Jouppi, Norm .
MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, :3-+