Accelerating number theoretic transform in GPU platform for fully homomorphic encryption

被引：21

作者：

Goey, Jia-Zheng ^{[1
]}

Lee, Wai-Kong ^{[2
]}

Goi, Bok-Min ^{[1
]}

Yap, Wun-She ^{[1
]}

机构：

[1] Univ Tunku Abdul Rahman, Jalan Sungai Long, Bandar Sungai Long 43000, Kajang, Malaysia

[2] Univ Tunku Abdul Rahman, Jalan Univ, Bandar Barat 31900, Kampar, Malaysia

来源：

JOURNAL OF SUPERCOMPUTING | 2021年 / 77卷 / 02期

关键词：

Number theoretic transform; Homomorphic encryption; Graphics processing unit; Cryptography; ALGORITHM; MULTIPLICATION;

D O I：

10.1007/s11227-020-03156-7

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In scientific computing and cryptography, there are many applications that involve large integer multiplication, which is a time-consuming operation. To reduce the computational complexity, number theoretic transform is widely used, wherein the multiplication can be performed in the frequency domain with reduced complexity. However, the speed performance of large integer multiplication is still not satisfactory if the operand size is very large (e.g., more than 100K-bit). In view of that, several researchers had proposed to accelerate the implementation of number theoretic transform using massively parallel GPU architecture. In this paper, we proposed several techniques to improve the performance of number theoretic transform implementation, which is faster than the state-of-the-art work by Dai et al. The proposed techniques include register-based twiddle factors storage and multi-stream asynchronous computation, which leverage on the features offered in new GPU architectures. The proposed number theoretic transform implementation was applied to CMNT fully homomorphic encryption scheme proposed by Coron et al. With the proposed implementation technique, homomorphic multiplications in CMNT take 0.27 ms on GTX1070 desktop GPU and 7.49 ms in Jetson TX1 embedded system, respectively. This shows that the proposed implementation is suitable for practical applications in server environment as well as embedded system.

引用

页码：1455 / 1474

页数：20

共 15 条

[1]

BARRETT P, 1987, LECT NOTES COMPUT SC, V263, P311

[2] AN ALGORITHM FOR MACHINE CALCULATION OF COMPLEX FOURIER SERIES [J].

COOLEY, JW ;

TUKEY, JW .

MATHEMATICS OF COMPUTATION, 1965, 19 (90) :297-&

[3]

CORON JS, 2011, LECT NOTES COMPUTER, V6841

[4]

DOROZ Y, 2014, 2 WORKSH APPL HOM CR

[5] HIGH PRECISION INTEGER MULTIPLICATION WITH A GPU USING STRASSEN'S ALGORITHM WITH MULTIPLE FFT SIZES [J].

Emmart, Niall ;

Weems, Charles C. .

PARALLEL PROCESSING LETTERS, 2011, 21 (03) :359-375

[6]

Gentry C., 2009, Stanford University

[7]

Gentry C, 2011, LECT NOTES COMPUT SC, V6632, P129, DOI 10.1007/978-3-642-20465-4_9

[8] Computing zeta functions of arithmetic schemes [J].

Harvey, David .

PROCEEDINGS OF THE LONDON MATHEMATICAL SOCIETY, 2015, 111 :1379-1401

[9] CUDA-based parallelization of a bio-inspired model for fast object classification [J].

Hernandez, Daniel E. ;

Olague, Gustavo ;

Hernandez, Benjamin ;

Clemente, Eddie .

NEURAL COMPUTING & APPLICATIONS, 2018, 30 (10) :3007-3018

[10] CPU versus GPU: which can perform matrix computation faster-performance comparison for basic linear algebra subprograms [J].

Li, Feng ;

Ye, Yunming ;

Tian, Zhaoyang ;

Zhang, Xiaofeng .

NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08) :4353-4365

← 1 2 →