Distributed-Memory Parallel JointNMF

被引：1

作者：

Eswar, Srinivas ^{[1
]}

Cobb, Benjamin ^{[2
]}

Hayashi, Koby ^{[2
]}

Kannan, Ramakrishnan ^{[3
]}

Ballard, Grey ^{[4
]}

Vuduc, Richard ^{[2
]}

Park, Haesun ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

[3] Oak Ridge Natl Lab, Oak Ridge, TN USA

[4] Wake Forest Univ, Dept Comp Sci, Winston Salem, NC 27101 USA

来源：

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023 | 2023年

基金：

美国能源部; 美国国家科学基金会;

关键词：

High Performance Computing; Multimodal Inputs; Nonnegative Matrix Factorization; NONNEGATIVE MATRIX; COMMUNICATION; MPI;

D O I：

10.1145/3577193.3593733

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.

引用

页码：301 / 312

页数：12

共 50 条

[21] Locality-preserving dynamic load balancing for data-parallel applications on distributed-memory multiprocessors
Liu, PF
Wu, JJ
Yang, CH
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2002, 18 (06) : 1037 - 1048
[22] Distributed-Memory Parallel Algorithms for Generating Massive Scale-free Networks Using Preferential Attachment Model
Alam, Maksudul
Khan, Maleq
Marathe, Madhav V.
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
[23] Performing BMMC permutations efficiently on distributed-memory multiprocessors with MPI
Cormen, TH
Clippinger, JC
ALGORITHMICA, 1999, 24 (3-4) : 349 - 370
[24] Efficient Lagrangian particle tracking algorithms for distributed-memory architectures
Baldan, Giacomo
Bellosta, Tommaso
Guardone, Alberto
COMPUTERS & FLUIDS, 2023, 256
[25] Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths
Hutter, Edward
Solomonik, Edgar
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 46 - 57
[26] Shared- and distributed-memory parallelization of a Lagrangian atmospheric dispersion model
Larson, DJ
Nasstrom, JS
ATMOSPHERIC ENVIRONMENT, 2002, 36 (09) : 1559 - 1564
[27] Shared-memory, distributed-memory, and mixed-mode parallelisation of a CFD simulation code
Jackson, Adrian
Campobasso, M. Sergio
COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2011, 26 (3-4): : 187 - 195
[28] Parallel Computation of Component Trees on Distributed Memory Machines
Goetz, Markus
Cavallaro, Gabriele
Geraud, Thierry
Book, Matthias
Riedel, Morris
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (11) : 2582 - 2598
[29] An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs
Schwitanski, Simon
Tomski, Felix
Protze, Joachim
Terboven, Christian
Mueller, Matthias S.
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 530 - 540
[30] A distributed-memory MPI parallelization scheme for multi-domain incompressible SPH
Monteleone A.
Burriesci G.
Napoli E.
Journal of Parallel and Distributed Computing, 2022, 170 : 53 - 67

← 1 2 3 4 5 →