High Performance Memory Requests Scheduling Technique for Multicore Processors

被引：3

作者：

El-Reedy, Walid ^{[1
]}

El-Moursy, Ali A. ^{[2
]}

Fahmy, Hossam A. H. ^{[1
]}

机构：

[1] Cairo Univ, Cairo, Egypt

[2] Univ Sharjah, Elect & Comp Engn, Sharjah, U Arab Emirates

来源：

2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS) | 2012年

关键词：

Computer architecture; Memory management; Multicore processing;

D O I：

10.1109/HPCC.2012.26

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In modern computer systems, long memory latency is one of the main bottlenecks micro-architects are facing for leveraging the system performance especially for memory-intensive applications. This emphasises the importance of the memory access scheduling to efficiently utilize memory bandwidth. Moreover, in recent micro-processors, multithread and multicore is turned to be the default choice for their design. This resulted in more contention on memory. Hence, the effect of memory access scheduling schemes is more critical to the overall performance boost. Although memory access scheduling techniques have been recently proposed for performance improvement, most of them have overlooked the fairness among the running applications. Achieving both high-throughput and fairness simultaneously is challenging. In this paper, we focus on the basic idea of memory requests scheduling, which includes how to assign priorities to threads, what request should be served first, and how to achieve fairness among the running applications for multicore microprocessors. We propose two new memory access scheduling techniques FLRMR, and FIQMR. Compared to recently proposed techniques, on average, FLRMR achieves 8.64% speedup relative to LREQ algorithm, and FIQMR achieves 11.34% speedup relative to IQ-based algorithm. FLRMR outperforms the best of the other techniques by 8.1% in 8-cores workloads. Moreover, FLRMR improves fairness over LREQ by 77.2% on average.

引用

页码：127 / 134

页数：8

共 17 条

[1]

[Anonymous], P INT S HIGH PERF CO

[2]

Burger D., 1997, The simplescalar toolset

[3]

Ebrahimi E., 2011, MICRO 44

[4] Partitioning multi-threaded processors with a large number of threads [J].

El-Moursy, A ;

Garg, R ;

Albonesi, DH ;

Dwarkadas, S .

ISPASS 2005: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2005, :112-123

[5]

Farkas K. I., 1994, ISCA 21 APR

[6] Access order and effective bandwidth for streams on a Direct Rambus memory [J].

Hong, SI ;

McKee, SA ;

Salinas, MH ;

Klenke, RH ;

Aylor, JH ;

Wulf, WA .

FIFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1999, :80-89

[7]

Kim Y., 2010, MICRO 43

[8]

Kim Y., 2010, HPCA 16

[9]

Kroft D., 1981, P 8 INT S COMP ARCH

[10]

Mckee S. A., 1995, Proceedings. First IEEE Symposium on High-Performance Computer Architecture, P253, DOI 10.1109/HPCA.1995.386537

← 1 2 →