Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system

被引：14

作者：

Li, KQ ^{[1
]}

Pan, VY

机构：

[1] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

[2] CUNY Herbert H Lehman Coll, Dept Math & Comp Sci, Bronx, NY 10468 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2001年 / 50卷 / 05期

基金：

美国国家航空航天局; 美国国家科学基金会;

关键词：

bilinear algorithm; cost-optimality; distributed memory system; linear array; matrix multiplication; optical pipelined bus; PRAM; reconfigurable system; speedup;

D O I：

10.1109/12.926164

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The known fast sequential algorithms for multiplying two N x N matrices lover an arbitrary ring) have time complexity O(N(alpha)), where 2 < <alpha> < 3. The current best value of a is less than 2.3755. We show that, for all 1 <less than or equal to> p less than or equal to N(alpha), multiplying two N x N matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in O(N(alpha)/p + (N(2)/p(2)/alpha) log p) time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all 1 less than or equal to P less than or equal to N(2.3755), multiplying two iii x N matrices can be performed on a p-processor LARPBS in O(N(2.3755)/p + (N(2)/p(2)/alpha) log p) time and linear speedup can be achieved for p as large as O(N(2.3755)/(log N)(6.3262)). Furthermore. multiplying two N x N matrices can be performed on an LARPBS with O(N(alpha)) processors in O(log N) time. This compares favorably with the performance on a PRAM.

引用

页码：519 / 525

页数：7