Multiclass classification of distributed memory parallel computations

被引:7
|
作者
Whalen, Sean [1 ]
Peisert, Sean [2 ,3 ]
Bishop, Matt [3 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
Multiclass classification; Bayesian networks; Random forests; Self-organizing maps; High performance computing; Communication patterns; NETWORK MOTIFS;
D O I
10.1016/j.patrec.2012.10.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High Performance Computing (HPC) is a field concerned with solving large-scale problems in science and engineering. However, the computational infrastructure of HPC systems can also be misused as demonstrated by the recent commoditization of cloud computing resources on the black market As a first step towards addressing this, we introduce a machine learning approach for classifying distributed parallel computations based on communication patterns between compute nodes. We first provide relevant background on message passing and computational equivalence classes called dwarfs and describe our exploratory data analysis using self organizing maps. We then present our classification results across 29 scientific codes using Bayesian networks and compare their performance against Random Forest classifiers. These models, trained with hundreds of gigabytes of communication logs collected at Lawrence Berkeley National Laboratory, perform well without any a priori information and address several shortcomings of previous approaches. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:322 / 329
页数:8
相关论文
共 50 条
  • [1] Privacy Preserving Multiclass Classification for Horizontally Distributed Data
    Lu, Yunmei
    Yan, Mingyuan
    Han, Meng
    Yang, Qingliang
    Zhang, Yanqing
    SIGITE'18: PROCEEDINGS OF THE 19TH ANNUAL SIG CONFERENCE ON INFORMATION TECHNOLOGY EDUCATION, 2018, : 165 - 165
  • [2] Visualizing Distributed Memory Computations with Hive Plots
    Engle, Sophie
    Whalen, Sean
    VIZSEC 2012: PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON VISUALIZATION FOR CYBER SECURITY, 2012, : 56 - 63
  • [3] Complexity-based parallel rule induction for multiclass classification
    Asadi, Shahrokh
    Shahrabi, Jamal
    INFORMATION SCIENCES, 2017, 380 : 53 - 73
  • [4] Distributed-Memory Parallel JointNMF
    Eswar, Srinivas
    Cobb, Benjamin
    Hayashi, Koby
    Kannan, Ramakrishnan
    Ballard, Grey
    Vuduc, Richard
    Park, Haesun
    PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 301 - 312
  • [5] Measuring the performance of parallel computers with distributed memory
    Iushchenko R.A.
    Cybernetics and Systems Analysis, 2009, 45 (6) : 941 - 951
  • [6] MEASURING THE PERFORMANCE OF PARALLEL COMPUTERS WITH DISTRIBUTED MEMORY
    Iushehenko, R. A.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2009, 45 (06) : 933 - 943
  • [7] Parallel Computation of Component Trees on Distributed Memory Machines
    Goetz, Markus
    Cavallaro, Gabriele
    Geraud, Thierry
    Book, Matthias
    Riedel, Morris
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (11) : 2582 - 2598
  • [8] Parallel feature selection for distributed-memory clusters
    Gonzalez-Dominguez, Jorge
    Bolon-Canedo, Veronica
    Freire, Borja
    Tourino, Juan
    INFORMATION SCIENCES, 2019, 496 : 399 - 409
  • [9] On the consistency of multiclass classification methods
    Tewari, Ambuj
    Bartlett, Peter L.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 1007 - 1025
  • [10] Extreme Multiclass Classification Criteria
    Choromanska, Anna
    Jain, Ish Kumar
    COMPUTATION, 2019, 7 (01)