Distributed-Memory Parallel JointNMF

被引:1
|
作者
Eswar, Srinivas [1 ]
Cobb, Benjamin [2 ]
Hayashi, Koby [2 ]
Kannan, Ramakrishnan [3 ]
Ballard, Grey [4 ]
Vuduc, Richard [2 ]
Park, Haesun [2 ]
机构
[1] Argonne Natl Lab, Lemont, IL 60439 USA
[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
[4] Wake Forest Univ, Dept Comp Sci, Winston Salem, NC 27101 USA
来源
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023 | 2023年
基金
美国能源部; 美国国家科学基金会;
关键词
High Performance Computing; Multimodal Inputs; Nonnegative Matrix Factorization; NONNEGATIVE MATRIX; COMMUNICATION; MPI;
D O I
10.1145/3577193.3593733
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.
引用
收藏
页码:301 / 312
页数:12
相关论文
共 50 条
  • [21] Locality-preserving dynamic load balancing for data-parallel applications on distributed-memory multiprocessors
    Liu, PF
    Wu, JJ
    Yang, CH
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2002, 18 (06) : 1037 - 1048
  • [22] Distributed-Memory Parallel Algorithms for Generating Massive Scale-free Networks Using Preferential Attachment Model
    Alam, Maksudul
    Khan, Maleq
    Marathe, Madhav V.
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [23] Performing BMMC permutations efficiently on distributed-memory multiprocessors with MPI
    Cormen, TH
    Clippinger, JC
    ALGORITHMICA, 1999, 24 (3-4) : 349 - 370
  • [24] Efficient Lagrangian particle tracking algorithms for distributed-memory architectures
    Baldan, Giacomo
    Bellosta, Tommaso
    Guardone, Alberto
    COMPUTERS & FLUIDS, 2023, 256
  • [25] Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths
    Hutter, Edward
    Solomonik, Edgar
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 46 - 57
  • [26] Shared- and distributed-memory parallelization of a Lagrangian atmospheric dispersion model
    Larson, DJ
    Nasstrom, JS
    ATMOSPHERIC ENVIRONMENT, 2002, 36 (09) : 1559 - 1564
  • [27] Shared-memory, distributed-memory, and mixed-mode parallelisation of a CFD simulation code
    Jackson, Adrian
    Campobasso, M. Sergio
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2011, 26 (3-4): : 187 - 195
  • [28] Parallel Computation of Component Trees on Distributed Memory Machines
    Goetz, Markus
    Cavallaro, Gabriele
    Geraud, Thierry
    Book, Matthias
    Riedel, Morris
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (11) : 2582 - 2598
  • [29] An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs
    Schwitanski, Simon
    Tomski, Felix
    Protze, Joachim
    Terboven, Christian
    Mueller, Matthias S.
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 530 - 540
  • [30] A distributed-memory MPI parallelization scheme for multi-domain incompressible SPH
    Monteleone A.
    Burriesci G.
    Napoli E.
    Journal of Parallel and Distributed Computing, 2022, 170 : 53 - 67