Large-Scale Experiment for Topology-Aware Resource Management

被引:0
作者
Georgiou, Yiannis [1 ]
Mercier, Guillaume [2 ]
Villiermet, Adele [3 ]
机构
[1] Atos Bull, Grenoble, France
[2] Bordeaux INP, Talence, France
[3] Inria Bordeaux Sud Ouest, Talence, France
来源
EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS | 2018年 / 10659卷
关键词
Resource management; Job allocation; Topology-aware placement; Scheduling; SLURM; PLACEMENT;
D O I
10.1007/978-3-319-75178-8_15
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in SLURM and validated it with several optimization schemes by making comparisons with the default SLURM algorithm. This paper presents further experiments with regard to this selection process.
引用
收藏
页码:179 / 186
页数:8
相关论文
共 15 条
  • [1] adaptivecomputing, ADAPTIVE COMPUTING T
  • [2] [Anonymous], ORACLE GRID ENGINE
  • [3] Balle Susanne M., 2007, Job Scheduling Strategies for Parallel Processing. 13th International Workshop, JSSPP 2007. Revised papers, P37
  • [4] Bosilca G., 2017, 23 INT EUROPEAN C PA, P12
  • [5] Capit N, 2005, 2005 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, P776
  • [6] Fujitsu, INT TOP AW RES ASS
  • [7] Georgiou Y., 2013, LNCS, V7698, P134, DOI [10.1145/3005745.3005750, DOI 10.1145/3005745.3005750, 10.1007/978-3-642-35867, DOI 10.1007/978-3-642-35867]
  • [8] Topology-aware resource management for HPC applications
    Georgiou, Yiannis
    Jeannot, Emmanuel
    Mercier, Guillaume
    Villiermet, Adele
    [J]. 18TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN 2017), 2017,
  • [9] Topology-aware job mapping
    Georgiou, Yiannis
    Jeannot, Emmanuel
    Mercier, Guillaume
    Villiermet, Adele
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (01) : 14 - 27
  • [10] Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques
    Jeannot, Emmanuel
    Mercier, Guillaume
    Tessier, Francois
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (04) : 993 - 1002