Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

被引：7

作者：

Zhang, Qimin ^{[1
]}

Kremer-Herman, Nathaniel ^{[2
]}

Tovar, Benjamin ^{[2
]}

Thain, Douglas ^{[2
]}

机构：

[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing, Peoples R China

[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS) | 2018年

关键词：

high throughput computing (HTC); density-based clustering; automatic resource allocation; resource consumption optimization;

D O I：

10.1109/WORKS.2018.00006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An end user running a scientific workflow will often ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for addressing this resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time, cores, memory, and disk savings are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores, memory, and disk savings are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.

引用

页码：1 / 9

页数：9

共 50 条

[21] DBSVEC: Density-Based Clustering Using Support Vector Expansion
Wang, Zhen
Zhang, Rui
Qi, Jianzhong
Yuan, Bo
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 280 - 291
[22] Deep density-based image clustering
Ren, Yazhou
Wang, Ni
Li, Mingxia
Xu, Zenglin
KNOWLEDGE-BASED SYSTEMS, 2020, 197
[23] Anytime parallel density-based clustering
Son T. Mai
Ira Assent
Jon Jacobsen
Martin Storgaard Dieu
Data Mining and Knowledge Discovery, 2018, 32 : 1121 - 1176
[24] Fast density estimation for density-based clustering methods
Cheng, Difei
Xu, Ruihang
Zhang, Bo
Jin, Ruinan
NEUROCOMPUTING, 2023, 532 : 170 - 182
[25] Rolling Element Bearing Fault Detection Using Density-Based Clustering
Tian, Jing
Azarian, Michael H.
Pecht, Michael
2014 IEEE CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (PHM), 2014,
[26] DCSNE: Density-based Clustering using Graph Shared Neighbors and Entropy
Maheshwari, Rashmi
Mohanty, Sraban Kumar
Mishra, Amaresh Chandra
PATTERN RECOGNITION, 2023, 137
[27] Location- and density-based hierarchical clustering using similarity analysis
Bajcsy, P
Ahuja, N
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (09) : 1011 - 1015
[28] A Novel Density-Based Clustering Framework by Using Level Set Method
Wang, Xiao-Feng
Huang, De-Shuang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (11) : 1515 - 1531
[29] Scalable density-based clustering with quality guarantees using random projections
Schneider, Johannes
Vlachos, Michail
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (04) : 972 - 1005
[30] Density-based semi-supervised clustering
Carlos Ruiz
Myra Spiliopoulou
Ernestina Menasalvas
Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370

← 1 2 3 4 5 →