RHPM: Using Relative Hotness to Guide Page Migration for Hybrid Memory Systems

被引：4

作者：

Peng, Zhouxuan ^{[1
]}

Feng, Dan ^{[1
]}

Chen, Jianxi ^{[1
]}

Hu, Jing ^{[1
]}

Huang, Chuang ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Engn Res Ctr Data Storage Syst & Technol, Sch Comp Sci & Technol, Wuhan Natl Lab Optoelect,Key Lab Informat Storage, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Data tiering; heterogeneous memory systems; nonvolatile memory (NVM); page migration policy; MAIN MEMORY; DRAM; PERFORMANCE;

D O I：

10.1109/TCAD.2022.3231836

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern computing systems and data-intensive applications are eager for larger and faster memory. Building hybrid memory systems with different memory technologies has become a dominant trend to satisfy these demands. For hybrid memory systems, page migration schemes that dynamically migrate frequently accessed hot pages into faster memory are crucial for improving performance. However, existing migration schemes are either too aggressive, resulting in unnecessary extra traffic, or too conservative to quickly adapt to changes in access patterns. Besides, the extra latency introduced by querying metadata is often ignored or handled in an unscalable manner. In this article, we propose a relative hotness page migration (RHPM) strategy, which discovers hot pages in a set of pages by competing with each other rather than comparing with a threshold. The migration is performed only when a new page wins the competition. To overlap latency due to access metadata, RHPM fetches metadata and data in parallel. In addition, it enables a small metadata buffer to speed up metadata access. Compared to the state-of-the-art scheme, RHPM requires only 1/512 of the on-chip capacity, significantly reducing on-chip hardware overhead. Evaluation of RHPM with simulations of 25 workloads shows that RHPM outperforms the state-of-the-art scheme by an average of 13.34% in performance and saves 44.19% on energy, demonstrating better resilience to changes in access patterns.

引用

页码：2514 / 2526

页数：13

共 43 条

[1]

Agarwal N, 2017, TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII), P631, DOI 10.1145/3037697.3037706

[2]

[Anonymous], 2013, 235 JEDEC

[3]

[Anonymous], 2019, INTEL OPTANE DC PERS

[4]

Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718

[5]

Chang HS, 2015, ICCAD-IEEE ACM INT, P22, DOI 10.1109/ICCAD.2015.7372545

[6] BATMAN: Techniques for Maximizing System Bandwidth of Memory Systems with Stacked-DRAM [J].

Chou, Chiachen ;

Jaleel, Aamer ;

Qureshi, Moinuddin .

MEMSYS 2017: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2017, :268-280

[7]

Chou C, 2014, INT SYMP MICROARCH, P1, DOI 10.1109/MICRO.2014.63

[8] BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches [J].

Chou, Chiachen ;

Jaleel, Aamer ;

Qureshi, Moinuddin K. .

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :198-210

[9] Effective file data-block placement for different types of page cache on hybrid main memory architectures [J].

Dai, Penglin ;

Zhuge, Qingfeng ;

Chen, Xianzhang ;

Jiang, Weiwen ;

Sha, Edwin H. -M. .

DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2013, 17 (3-4) :485-506

[10]

Gogte V, 2019, PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, P45

← 1 2 3 4 5 →