Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

被引：7

作者：

Zhang, Qimin ^{[1
]}

Kremer-Herman, Nathaniel ^{[2
]}

Tovar, Benjamin ^{[2
]}

Thain, Douglas ^{[2
]}

机构：

[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing, Peoples R China

[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS) | 2018年

关键词：

high throughput computing (HTC); density-based clustering; automatic resource allocation; resource consumption optimization;

D O I：

10.1109/WORKS.2018.00006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An end user running a scientific workflow will often ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for addressing this resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time, cores, memory, and disk savings are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores, memory, and disk savings are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.

引用

页码：1 / 9

页数：9

共 50 条

[31] Anytime density-based clustering of complex data
Mai, Son T.
He, Xiao
Feng, Jing
Plant, Claudia
Boehm, Christian
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (02) : 319 - 355
[32] PARDICLE: Parallel Approximate Density-based Clustering
Patwary, Md. Mostofa Ali
Satish, Nadathur
Sundaram, Narayanan
Manne, Fredrik
Habib, Salman
Dubey, Pradeep
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 560 - 571
[33] Geometric algorithms for density-based data clustering
Chen, DZ
Smid, M
Xu, B
INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 2005, 15 (03) : 239 - 260
[34] Multi-step density-based clustering
Brecheisen, S
Kriegel, HP
Pfeifle, M
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (03) : 284 - 308
[35] Multi-step density-based clustering
Stefan Brecheisen
Hans-Peter Kriegel
Martin Pfeifle
Knowledge and Information Systems, 2006, 9 : 284 - 308
[36] A density-based spatial clustering for physical constraints
Wang, Xin
Rostoker, Camilo
Hamilton, Howard J.
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 38 (01) : 269 - 297
[37] Incremental Density-Based Clustering on Multicore Processors
Mai, Son T.
Jacobsen, Jon
Amer-Yahia, Sihem
Spence, Ivor
Nhat-Phuong Tran
Assent, Ira
Quoc Viet Hung Nguyen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1338 - 1356
[38] Anytime density-based clustering of complex data
Son T. Mai
Xiao He
Jing Feng
Claudia Plant
Christian Böhm
Knowledge and Information Systems, 2015, 45 : 319 - 355
[39] Density-based semi-supervised clustering
Ruiz, Carlos
Spiliopoulou, Myra
Menasalvas, Ernestina
DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (03) : 345 - 370
[40] Density-based hierarchical clustering for streaming data
Tu, Q.
Lu, J. F.
Yuan, B.
Tang, J. B.
Yang, J. Y.
PATTERN RECOGNITION LETTERS, 2012, 33 (05) : 641 - 645

← 1 2 3 4 5 →