共 31 条
[1]
[Anonymous], P INT C HIGH PERF CO
[2]
[Anonymous], 2017, LOGAIDER TOOL
[3]
[Anonymous], 2018, ARGONNE MIRA RAS LOG
[4]
Measuring and Understanding Extreme-Scale Application Resilience: A Field Study of 5,000,000 HPC Application Runs
[J].
2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS,
2015,
:25-36
[5]
LOGAIDER: A tool for mining potential correlations of HPC log events
[J].
2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID),
2017,
:442-451
[6]
Feinberg A., 2017, 3000 PROCESSOR SUPER
[7]
Quantifying temporal and spatial correlation of failure events for proactive management
[J].
SRDS 2007: 26TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS,
2007,
:175-+
[8]
Goldstein MM, 2010, Perspect Health Inf Manag, V7, P1
[9]
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems
[J].
2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS,
2015,
:37-44
[10]
Hamerly G., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P600, DOI 10.1145/584792.584890