Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

被引：7

作者：

Zhang, Qimin ^{[1
]}

Kremer-Herman, Nathaniel ^{[2
]}

Tovar, Benjamin ^{[2
]}

Thain, Douglas ^{[2
]}

机构：

[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing, Peoples R China

[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS) | 2018年

关键词：

high throughput computing (HTC); density-based clustering; automatic resource allocation; resource consumption optimization;

D O I：

10.1109/WORKS.2018.00006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An end user running a scientific workflow will often ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for addressing this resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time, cores, memory, and disk savings are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores, memory, and disk savings are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.

引用

页码：1 / 9

页数：9

共 50 条

[41] Density-based clustering with boundary samples verification
Peng, Jie
Chen, Yong
APPLIED SOFT COMPUTING, 2024, 159
[42] A density-based clustering algorithm for earthquake zoning
Scitovski, Sanja
COMPUTERS & GEOSCIENCES, 2018, 110 : 90 - 95
[43] A density-based spatial clustering for physical constraints
Xin Wang
Camilo Rostoker
Howard J. Hamilton
Journal of Intelligent Information Systems, 2012, 38 : 269 - 297
[44] dbscan: Fast Density-Based Clustering with R
Hahsler, Michael
Piekenbrock, Matthew
Doran, Derek
JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01): : 1 - 30
[45] Fully Automated Density-Based Clustering Method
Bataineh, Bilal
Alzahrani, Ahmad A.
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 1833 - 1851
[46] ScaleSCAN: Scalable Density-Based Graph Clustering
Shiokawa, Hiroaki
Takahashi, Tomokatsu
Kitagawa, Hiroyuki
DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 18 - 34
[47] A novel density-based clustering algorithm using nearest neighbor graph
Li, Hao
Liu, Xiaojie
Li, Tao
Gan, Rundong
PATTERN RECOGNITION, 2020, 102
[48] Scalable density-based clustering with quality guarantees using random projections
Johannes Schneider
Michail Vlachos
Data Mining and Knowledge Discovery, 2017, 31 : 972 - 1005
[49] Optimal Bandwidth Selection for Density-Based Clustering
Jin, Hong
Wang, Shuliang
Zhou, Qian
Li, Ying
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2011, 2011, 6637 : 156 - 167
[50] RECOME: A new density-based clustering algorithm using relative KNN kernel density
Geng, Yangli-ao
Li, Qingyong
Zheng, Rong
Zhuang, Fuzhen
He, Ruisi
Xiong, Naixue
INFORMATION SCIENCES, 2018, 436 : 13 - 30

← 1 2 3 4 5 →