Acceleration of Data-Intensive Workflow Applications by Using File Access History

被引:3
|
作者
Horiuchi, Miki [1 ]
Taura, Kenjiro [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
来源
2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC) | 2012年
关键词
D O I
10.1109/SC.Companion.2012.31
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage-out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.
引用
收藏
页码:157 / 165
页数:9
相关论文
共 50 条
  • [1] Distributed Scientific Workflow Management for Data-Intensive Applications
    Shumilov, S.
    Leng, Y.
    El-Gayyar, M.
    Cremers, A. B.
    12TH IEEE INTERNATIONAL WORKSHOP ON FUTURE TRENDS OF DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2008, : 65 - 73
  • [2] PiF: In-Flash Acceleration for Data-Intensive Applications
    Chun, Myungjun
    Lee, Jaeyong
    Lee, Sanggu
    Kim, Myungsuk
    Kim, Jihong
    PROCEEDINGS OF THE 2022 14TH ACM WORKSHOP ON HOT TOPICS IN STORAGE AND FILE SYSTEMS, HOTSTORAGE 2022, 2022, : 106 - 112
  • [3] Integrating Policy with Scientific Workflow Management for Data-Intensive Applications
    Chervenak, Ann L.
    Smith, David E.
    Chen, Weiwei
    Deelman, Ewa
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 140 - 149
  • [4] Data-intensive workflow management: For clouds and data-intensive and scalable computing environments
    De Oliveira, Daniel C.M.
    Liu, Ji
    Pacitti, Esther
    Synthesis Lectures on Data Management, 2019, 14 (04): : 1 - 179
  • [5] Data-intensive Workflow Execution using Distributed Compute Resources
    Pandey, Ashish
    Wang, Songjie
    Calyam, Prasad
    2019 IEEE 27TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (IEEE ICNP), 2019,
  • [6] Flexible and efficient workflow deployment of data-intensive applications on grids with MOTEUR
    Glatard, Tristan
    Montagnat, Johan
    Lingrand, Diane
    Pennec, Xavier
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2008, 22 (03): : 347 - 360
  • [7] A Survey of Data-Intensive Scientific Workflow Management
    Liu, Ji
    Pacitti, Esther
    Valduriez, Patrick
    Mattoso, Marta
    JOURNAL OF GRID COMPUTING, 2015, 13 (04) : 457 - 493
  • [8] A Survey of Data-Intensive Scientific Workflow Management
    Ji Liu
    Esther Pacitti
    Patrick Valduriez
    Marta Mattoso
    Journal of Grid Computing, 2015, 13 : 457 - 493
  • [9] Streamlining data-intensive biology with workflow systems
    Reiter, Taylor
    Brooks, Phillip T.
    Irber, Luiz
    Joslin, Shannon E. K.
    Reid, Charles M.
    Scott, Camille
    Brown, C. Titus
    Pierce-Ward, N. Tessa
    GIGASCIENCE, 2021, 10 (01):
  • [10] Applications in Data-Intensive Computing
    Shah, Anuj R.
    Adkins, Joshua N.
    Baxter, Douglas J.
    Cannon, William R.
    Chavarria-Miranda, Daniel G.
    Choudhury, Sutanay
    Gorton, Ian
    Gracio, Deborah K.
    Halter, Todd D.
    Jaitly, Navdeep D.
    Johnson, John R.
    Kouzes, Richard T.
    Macduff, Matthew C.
    Marquez, Andres
    Monroe, Matthew E.
    Oehmen, Christopher S.
    Pike, William A.
    Scherrer, Chad
    Villa, Oreste
    Webb-Robertson, Bobbie-Jo
    Whitney, Paul D.
    Zuljevic, Nino
    ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70