Workflow resiliency for large-scale distributed applications

被引:5
作者
Toan Nguyen [1 ]
Desideri, Jean-Antoine [1 ]
Selmin, Vittorio [2 ]
机构
[1] INRIA, Ctr Rech Grenoble Rhone Alpes, FR-38334 Saint Ismier, France
[2] Alenia Aeronaut, I-10146 Turin, Italy
来源
2009 THIRD INTERNATIONAL CONFERENCE ON ADVANCED ENGINEERING COMPUTING AND APPLICATIONS IN SCIENCES (ADVCOMP 2009) | 2009年
关键词
workflows; resiliency; distributed computing; parallel computing; large-scale applications;
D O I
10.1109/ADVCOMP.2009.9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large-scale simulation and optimization are demanding applications that require high-performance computing platforms. Because their economic impact is fundamental to the industry, they also require robust, seamless and effective mechanisms to support dynamic user interactions, as well as fault-tolerance and resiliency on parallel computing platforms. Distributed workflows are considered here as a means to support large-scale dynamic and resilient multiphysics simulation and optimization applications, such as multiphysics aircraft simulation.
引用
收藏
页码:7 / +
页数:2
相关论文
共 50 条
[21]   Resource Allocation for Energy Efficient Large-Scale Distributed Systems [J].
Lee, Young Choon ;
Zomaya, Albert Y. .
INFORMATION SYSTEMS, TECHNOLOGY AND MANAGEMENT, PROCEEDINGS, 2010, 54 :16-19
[22]   A Distributed LRTCO Algorithm in Large-Scale DVE Multimedia Systems [J].
Zhou, Hangjun ;
Sun, Guang ;
Fu, Sha ;
Jiang, Wangdong ;
Xie, Tingting ;
Duan, Danqing .
CMC-COMPUTERS MATERIALS & CONTINUA, 2018, 56 (01) :73-89
[23]   The New Large-Scale RNNLM System Based On Distributed Neuron [J].
Niu, Dejiao ;
Xue, Rui ;
Cai, Tao ;
Li, Hai ;
Effah, Kingsley ;
Zhang, Hang .
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, :433-436
[24]   A distributed approach for large-scale classifier training and image classification [J].
Mei, Kuizhi ;
Dong, Peixiang ;
Lei, Hao ;
Fan, Jianping .
NEUROCOMPUTING, 2014, 144 :304-317
[25]   Resource allocation for energy efficient large-scale distributed systems [J].
Lee Y.C. ;
Zomaya A.Y. .
Communications in Computer and Information Science, 2010, 54 :16-19
[26]   Large-scale distributed linear algebra with tensor processing units [J].
Lewis, Adam G. M. ;
Beall, Jackson ;
Ganahl, Martin ;
Hauru, Markus ;
Mallick, Shrestha Basu ;
Vidal, Guifre .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (33)
[27]   A distributed computation of the shortest path in large-scale road network [J].
Zhang, Dongbo ;
Zhang, Wei ;
Yang, Rui ;
Guo, Mamman ;
Chen, Chien-Ming .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019,
[28]   Software testing and evaluation in large-scale scientific applications [J].
Mu, M .
QUALITY OF NUMERICAL SOFTWARE - ASSESSMENT AND ENHANCEMENT, 1997, :330-332
[29]   PDBSCAN: Parallel DBSCAN for Large-Scale Clustering Applications [J].
谢永红 ;
马延辉 ;
周芳 ;
刘颖安 .
Journal of Donghua University(English Edition), 2012, 29 (01) :76-79
[30]   FabSim: Facilitating computational research through automation on large-scale and distributed e-infrastructures [J].
Groen, Derek ;
Bhati, Agastya P. ;
Suter, James ;
Hetherington, James ;
Zasada, Stefan J. ;
Coveney, Peter V. .
COMPUTER PHYSICS COMMUNICATIONS, 2016, 207 :375-385