On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Microthreaded Many-Core Architecture with Distributed Caches

被引：1

作者：

Yang, Qiang ^{[1
]}

Fu, Jian ^{[1
]}

Poss, Raphael ^{[1
]}

Jesshope, Chris ^{[1
]}

机构：

[1] Univ Amsterdam, CSA Grp, NL-1098 XH Amsterdam, Netherlands

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2014年 / 13卷

关键词：

Design; Experimentation; Performance; Hardware coherence; distributed cache; many-core system; massive parallelism; on-chip memory network; write combination; MEMORY-SYSTEMS;

D O I：

10.1145/2567931

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.

引用

页数：21

共 36 条

[1]

Agarwal D., 2003, PROCEEDINGS OF THE 4

[2]

[Anonymous], P 4 ANN INT C SYST S

[3]

Bakhoda A., 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P421, DOI 10.1109/MICRO.2010.50

[4] A General Model of Concurrency and its Implementation as Many-core Dynamic RISC Processors [J].

Bernard, T. ;

Bousias, K. ;

Guang, L. ;

Jesshope, C. R. ;

Lankamp, M. ;

van Tol, M. W. ;

Zhang, L. .

2008 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING AND SIMULATION, PROCEEDINGS, 2008, :1-9

[5]

Bianchini R., 1994, TECH REP

[6] Instruction level parallelism through microthreading - A scalable approach to chip multiprocessors [J].

Bousias, K ;

Hasasneh, N ;

Jesshope, C .

COMPUTER JOURNAL, 2006, 49 (02) :211-233

[7] Memory Bandwidth Limitations of Future Microprocessors [J].

Burger, D. ;

Goodman, J. R. ;

Kaegi, A. .

Computer Architecture News, 1996, 24 (02)

[8]

Chen Ding, 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000, P181, DOI 10.1109/IPDPS.2000.845980

[9]

Danek M., 2011, UTLEON3 EXPLORING FI

[10]

Das Reetuparna, 2009, Proceedings of the 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), P280, DOI 10.1145/1669112.1669150

← 1 2 3 4 →