Explicit Fourth-Order Runge–Kutta Method on Intel Xeon Phi Coprocessor

被引：0

作者：

Beata Bylina

Joanna Potiopa

机构：

[1] Maria Curie-Skłodowska University,Department of Computer Science

来源：

International Journal of Parallel Programming | 2017年 / 45卷

关键词：

Intel Xeon Phi; Fourth-order Runge–Kutta method; CSR format; Intel Math Kernel Library (Intel MKL); SpMV; OpenMP;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge–Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this work an implementation based on Intel Math Kernel Library (Intel MKL) routines and the authors’ own implementation, both using the CSR storage scheme and working on Intel Xeon Phi, were investigated. The implementation based on the Intel MKL library uses the high-performance BLAS and Sparse BLAS routines. In our application we focus on OpenMP style programming. We implement SpMV operation and vector addition using the basic optimizing techniques and the vectorization. We evaluate our approach in native and offload modes for various number of cores and thread allocation affinities. Both implementations (based on Intel MKL and made by the authors) were compared in respect of the time, the speedup and the performance. The numerical experiments on Intel Xeon Phi show that the performance of authors’ implementation is very promising and gives a gain of up to two times compared to the multithreaded implementation (based on Intel MKL) running on CPU (Intel Xeon processor) and even three times in comparison with the application which uses Intel MKL on Intel Xeon Phi.

引用

页码：1073 / 1090

页数：17

共 31 条

[1]

Bianchi G(2000)Performance analysis of the IEEE 802.11 distributed coordination function IEEE J. Sel. Areas Commun. 18 535-547

[2]

Bylina B(2011)Computational aspects of GPU-accelerated sparse matrix-vector multiplication for solving Markov models Theor. Appl. Inform. 23 127-145

[3]

Bylina J(2011)A Markovian queuing model of a WLAN node Commun. Comput. Inform. Sci. 160 80-86

[4]

Karwacki M(2012)Markovian model of a network of two wireless devices Commun. Comput. Inform. Sci. 291 411-420

[5]

Bylina J(2007)Markov chain models of a telephone call center with call blending Comput. OR 34 1616-1645

[6]

Bylina B(1990)A set of level 3 basic linear algebra subprograms ACM Trans. Math. Softw. 16 1-17

[7]

Bylina J(2002)An overview of the sparse basic linear algebra subprograms: the new standard from the BLAS technical forum ACM Trans. Math. Softw. 28 239-267

[8]

Bylina B(1981)Algebraically stable and implementable Runge–Kutta methods of high order SIAM J. Numer. Anal. 18 1098-1108

[9]

Karwacki M(1979)Basic linear algebra subprograms for fortran usage ACM Trans. Math. Soft. 5 308-329

[10]

Deslauriers A(2010)OpenCL: A parallel programming standard for heterogeneous computing systems Comput. Sci. Eng. 12 66-73

← 1 2 3 4 →