Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

被引:7
|
作者
Zhang, Qimin [1 ]
Kremer-Herman, Nathaniel [2 ]
Tovar, Benjamin [2 ]
Thain, Douglas [2 ]
机构
[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Key Lab Space Utilizat, Beijing, Peoples R China
[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
来源
PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS) | 2018年
关键词
high throughput computing (HTC); density-based clustering; automatic resource allocation; resource consumption optimization;
D O I
10.1109/WORKS.2018.00006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An end user running a scientific workflow will often ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for addressing this resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time, cores, memory, and disk savings are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores, memory, and disk savings are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [21] DBSVEC: Density-Based Clustering Using Support Vector Expansion
    Wang, Zhen
    Zhang, Rui
    Qi, Jianzhong
    Yuan, Bo
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 280 - 291
  • [22] Deep density-based image clustering
    Ren, Yazhou
    Wang, Ni
    Li, Mingxia
    Xu, Zenglin
    KNOWLEDGE-BASED SYSTEMS, 2020, 197
  • [23] Anytime parallel density-based clustering
    Son T. Mai
    Ira Assent
    Jon Jacobsen
    Martin Storgaard Dieu
    Data Mining and Knowledge Discovery, 2018, 32 : 1121 - 1176
  • [24] Fast density estimation for density-based clustering methods
    Cheng, Difei
    Xu, Ruihang
    Zhang, Bo
    Jin, Ruinan
    NEUROCOMPUTING, 2023, 532 : 170 - 182
  • [25] Rolling Element Bearing Fault Detection Using Density-Based Clustering
    Tian, Jing
    Azarian, Michael H.
    Pecht, Michael
    2014 IEEE CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (PHM), 2014,
  • [26] DCSNE: Density-based Clustering using Graph Shared Neighbors and Entropy
    Maheshwari, Rashmi
    Mohanty, Sraban Kumar
    Mishra, Amaresh Chandra
    PATTERN RECOGNITION, 2023, 137
  • [27] Location- and density-based hierarchical clustering using similarity analysis
    Bajcsy, P
    Ahuja, N
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (09) : 1011 - 1015
  • [28] A Novel Density-Based Clustering Framework by Using Level Set Method
    Wang, Xiao-Feng
    Huang, De-Shuang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (11) : 1515 - 1531
  • [29] Scalable density-based clustering with quality guarantees using random projections
    Schneider, Johannes
    Vlachos, Michail
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (04) : 972 - 1005
  • [30] Density-based semi-supervised clustering
    Carlos Ruiz
    Myra Spiliopoulou
    Ernestina Menasalvas
    Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370