Reducing overheads for acquiring dynamic memory traces
被引:0
|
作者:
Gao, XF
论文数: 0引用数: 0
h-index: 0
机构:
San Diego Supercomp Ctr, San Diego, CA 92093 USASan Diego Supercomp Ctr, San Diego, CA 92093 USA
Gao, XF
[1
]
Laurenzano, M
论文数: 0引用数: 0
h-index: 0
机构:
San Diego Supercomp Ctr, San Diego, CA 92093 USASan Diego Supercomp Ctr, San Diego, CA 92093 USA
Laurenzano, M
[1
]
Simon, B
论文数: 0引用数: 0
h-index: 0
机构:
San Diego Supercomp Ctr, San Diego, CA 92093 USASan Diego Supercomp Ctr, San Diego, CA 92093 USA
Simon, B
[1
]
Snavely, A
论文数: 0引用数: 0
h-index: 0
机构:
San Diego Supercomp Ctr, San Diego, CA 92093 USASan Diego Supercomp Ctr, San Diego, CA 92093 USA
Snavely, A
[1
]
机构:
[1] San Diego Supercomp Ctr, San Diego, CA 92093 USA
来源:
IISWC - 2005: PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION
|
2005年
关键词:
D O I:
暂无
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
Tools for acquiring dynamic memory address information for large scale applications are important for performance modeling, optimization, and for trace-driven simulation. However, straightforward use of binary instrumentation tools for such a fine-grained task as address tracing can cause astonishing slowdown in application run time. For example, in a large scale FY05 collaboration with the Department of Defense High Performance Computing Modernization Office (HPCMO), over I million processor hours were expended in order to gather address information on 7 parallel applications. In this work, we discuss in detail the issues surrounding the performance of memory address acquisition using low-level binary instrumentation tracing. We present three techniques and optimizations to improve performance: 1) SimPoint-guided sampling, 2) instrumentation tool routine optimization, and 3) reduction of instrumentation points through static application analysis. The use of these three techniques together reduces instrumented application slowdown by an order of magnitude. The techniques are generally applicable and have been deployed in the MetaSim tracer thereby enabling memory address acquisition for real-sized applications. We expect the optimizations reported here will reduce the HPCMO effort by approximately 80% in FY06.
机构:
Univ Nacl Autonoma Mexico, Fac Ciencias Polit & Sociales, CES, Mexico City, DF, MexicoUniv Nacl Autonoma Mexico, Fac Ciencias Polit & Sociales, CES, Mexico City, DF, Mexico
Waldman, Gilda
REVISTA MEXICANA DE CIENCIAS POLITICAS Y SOCIALES,
2023,
68
(248):
: 351
-
353