IMPLEMENTATION OF THE BLAS LEVEL-3 AND LINPACK BENCHMARK ON THE AP1000

被引:0
|
作者
BRENT, RP
STRAZDINS, PE
机构
来源
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL | 1993年 / 29卷 / 01期
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) library and the LINPACK benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability. Many of these techniques may also be applied to other numerical applications. They include the use of software pipelining and loop unrolling to optimize scalar processor computation, the utilization of fast communication primitives on the AP1000 (particularly row and column broadcasting using wormhole routing), blocking and partitioning methods, and 'fast' algorithms (using reduced floating point operations). These techniques enable a performance of 85-90 % of the AP1000's theoretical peak speed for the BLAS Level 3 procedures and up to 80 % for the LINPACK benchmark.
引用
收藏
页码:61 / 70
页数:10
相关论文
共 50 条
  • [1] High-performance implementation of the level-3 BLAS
    Goto, Kazushige
    Van De Geijn, Robert
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01): : 1 - 14
  • [2] A PARALLEL BLOCK IMPLEMENTATION OF LEVEL-3 BLAS FOR MIMD VECTOR PROCESSORS
    DAYDE, MJ
    DUFF, IS
    PETITET, A
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1994, 20 (02): : 178 - 193
  • [3] DESIGN AND IMPLEMENTATION OF AN INTERCONNECTION NETWORK FOR THE AP1000
    HORIE, T
    ISHIHATA, H
    IKESAKA, M
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1992, 12 : 555 - 561
  • [4] STABILITY OF BLOCK ALGORITHMS WITH FAST LEVEL-3 BLAS
    DEMMEL, JW
    HIGHAM, NJ
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1992, 18 (03): : 274 - 291
  • [5] New Level-3 BLAS Kernels for Cholesky Factorization
    Gustavson, Fred G.
    Wasniewski, Jerzy
    Herrero, Jose R.
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 60 - 69
  • [6] IMPLEMENTATION OF THE LEVEL-2 AND LEVEL-3 BLAS ON THE CRAY Y-MP AND THE CRAY-2
    SHEIKH, Q
    VU, PO
    YANG, C
    MERCHANT, M
    JOURNAL OF SUPERCOMPUTING, 1992, 5 (04): : 291 - 305
  • [7] Level-3 BLAS on a GPU: Picking the Low Hanging Fruit
    Igual, Francisco D.
    Quintana-Orti, Gregorio
    van de Geijn, Robert A.
    INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009 (ICCMSE 2009), 2012, 1504 : 1109 - 1112
  • [8] A Linear Algebra Core Design For Efficient Level-3 BLAS
    Pedram, Ardavan
    Gilani, Syed Zohaib
    Kim, Nam Sung
    van de Geijn, Robert
    Schulte, Michael
    Gerstlauer, Andreas
    2012 IEEE 23RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2012, : 149 - 152
  • [9] AN IMPLEMENTATION OF DFM FOR RELIABILITY MODELING AND ANALYZING OF AP1000 FWCS
    Yin, Jiubo
    Cao, Jianyuan
    Wang, Xu
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON NUCLEAR ENGINEERING - 2013, VOL 3, 2014,
  • [10] Increasing data locality and introducing Level-3 BLAS in the Neville elimination
    Alonso, Pedro
    Cortina, Raquel
    Quintana-Orti, Enrique S.
    Ranilla, Jose
    APPLIED MATHEMATICS AND COMPUTATION, 2011, 218 (07) : 3348 - 3358