Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

被引:7
|
作者
Zhang, Qimin [1 ]
Kremer-Herman, Nathaniel [2 ]
Tovar, Benjamin [2 ]
Thain, Douglas [2 ]
机构
[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing, Peoples R China
[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
来源
PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS) | 2018年
关键词
high throughput computing (HTC); density-based clustering; automatic resource allocation; resource consumption optimization;
D O I
10.1109/WORKS.2018.00006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An end user running a scientific workflow will often ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for addressing this resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time, cores, memory, and disk savings are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores, memory, and disk savings are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Density-based clustering with boundary samples verification
    Peng, Jie
    Chen, Yong
    APPLIED SOFT COMPUTING, 2024, 159
  • [42] A density-based clustering algorithm for earthquake zoning
    Scitovski, Sanja
    COMPUTERS & GEOSCIENCES, 2018, 110 : 90 - 95
  • [43] A density-based spatial clustering for physical constraints
    Xin Wang
    Camilo Rostoker
    Howard J. Hamilton
    Journal of Intelligent Information Systems, 2012, 38 : 269 - 297
  • [44] dbscan: Fast Density-Based Clustering with R
    Hahsler, Michael
    Piekenbrock, Matthew
    Doran, Derek
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01): : 1 - 30
  • [45] Fully Automated Density-Based Clustering Method
    Bataineh, Bilal
    Alzahrani, Ahmad A.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 1833 - 1851
  • [46] ScaleSCAN: Scalable Density-Based Graph Clustering
    Shiokawa, Hiroaki
    Takahashi, Tomokatsu
    Kitagawa, Hiroyuki
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 18 - 34
  • [47] A novel density-based clustering algorithm using nearest neighbor graph
    Li, Hao
    Liu, Xiaojie
    Li, Tao
    Gan, Rundong
    PATTERN RECOGNITION, 2020, 102
  • [48] Scalable density-based clustering with quality guarantees using random projections
    Johannes Schneider
    Michail Vlachos
    Data Mining and Knowledge Discovery, 2017, 31 : 972 - 1005
  • [49] Optimal Bandwidth Selection for Density-Based Clustering
    Jin, Hong
    Wang, Shuliang
    Zhou, Qian
    Li, Ying
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2011, 2011, 6637 : 156 - 167
  • [50] RECOME: A new density-based clustering algorithm using relative KNN kernel density
    Geng, Yangli-ao
    Li, Qingyong
    Zheng, Rong
    Zhuang, Fuzhen
    He, Ruisi
    Xiong, Naixue
    INFORMATION SCIENCES, 2018, 436 : 13 - 30