Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU

被引:0
作者
Bohaienko, Vsevolod [1 ]
机构
[1] NAS Ukraine, VM Glushkov Inst Cybernet, Glushkov Ave 40, Kiev, Ukraine
关键词
Distributed-order derivative; Parallel computation; GPU; Tensor cores; Diffusion; DIFFERENTIAL-EQUATIONS; ALGORITHM; SCHEME;
D O I
10.1007/s10766-023-00754-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to an increased computational complexity of calculating the values of the dis-tributed-order Caputo fractional derivative compared to the classical Caputo deriva-tive there is a need to develop new techniques that accelerate it. In this paper for this purpose we propose to use a fast matrix "multiply and accumulate" operation avail-able in GPU's that contain the so-called tensor cores. We present and experimentally analyze the properties of GPU-algorithms that are based on the L1 finite-difference approximation of the derivative and incorporate them into the Crank-Nicholson scheme for the distributed-order time-fractional diffusion equation. The computation of derivative's values on GPU was faster than the multi-threaded implementation on CPU only for a large number of time steps with growing performance gain when number of time steps increase. The usage of the single-precision data type increased the error up to 2.7% comparing with the usage of the double-precision data type. Half-precision computations in tensor cores increased the error up to 29.5%. While solving a time-fractional diffusion equation, algorithms implemented for GPU with the usage of the single-precision data type were at least three times faster than the CPU-implementation for the number of time steps more than 1280. Data type pre-cision had only slight influence on the solution error with significantly increased execution time when the double-precision data type was used for data storage and processing.
引用
收藏
页码:256 / 270
页数:15
相关论文
共 24 条