EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2

被引:16
作者
Dong, Jiankuo [1 ]
Zheng, Fangyu [2 ]
Lin, Jingqiang [3 ]
Liu, Zhe [4 ]
Xiao, Fu [1 ]
Fan, Guang [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Jiangsu, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing, Peoples R China
[3] Univ Sci & Technol China, Sch Cyber Secur, Hefei, Peoples R China
[4] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
ECC; embedded graphics processing units; edge computing; SECURITY; FORM;
D O I
10.1145/3492734
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Driven by artificial intelligence and computer vision industries, Graphics Processing Units (GPUs) are now rapidly achieving extraordinary computing power. In particular, the NVIDIA Tegra K1/X1/X2 embedded GPU platforms, which are also treated as edge computing devices, are now widely used in embedded environments such as mobile phones, game consoles, and vehicle-mounted systems to support high-dimension display, auto-pilot, and so on. Meanwhile, with the rise of the Internet of Things (IoT), the demand for cryptographic operations for secure communications and authentications between edge computing nodes and IoT devices is also expanding. In this contribution, instead of the conventional implementations based on FPGA, ASIC, and ARM CPUs, we provide an alternative solution for cryptographic implementation on embedded GPU devices. Targeting the new cipher suite added in TLS 1.3, we implement Edwards25519/448 and Curve25519/448 on an edge computing platform, embedded GPU NVIDIA Tegra X2, where various performance optimizations are customized for the target platform, including a novel parallel method for the register-limited embedded GPUs. With about 15 W of power consumption, it can provide 210k/31k ops/s of Curve25519/448 scalar multiplication, 834k/123k ops/s of fixed-point Edwards25519/448 scalar multiplication, and 150k/22k ops/s of unknown-point one, which are respectively the primitives and main workloads of key agreement, signature generation, and verification of the TLS 1.3 protocol. Our implementations achieve 8 to 26 times speedup of OpenSSL running in the very powerful ARM CPU of the same platform and outperform the state-of-the-art implementations in FPGA by a wide margin with better power efficiency.
引用
收藏
页数:25
相关论文
共 52 条
[1]  
[Anonymous], 2017, CUDA C PROGRAMMING G
[2]  
Azarderakhsh Reza, 2017, Supersingular Isogeny Key Encapsulation. Submission to the NIST Post-Quantum Standardization Project
[3]  
Barker Elaine B., 2007, SP NATL I STANDARDS
[4]  
Bernstein DJ, 2008, LECT NOTES COMPUT SC, V5023, P389
[5]  
Bernstein DJ, 2006, LECT NOTES COMPUT SC, V3958, P207
[6]  
Bernstein DJ, 2012, LECT NOTES COMPUT SC, V7428, P320, DOI 10.1007/978-3-642-33027-8_19
[7]  
Bernstein DJ, 2009, LECT NOTES COMPUT SC, V5479, P483, DOI 10.1007/978-3-642-01001-9_28
[8]  
Bernstein Daniel J., 2015, FAILURES NISTS ECC S
[9]  
Bos JW, 2012, LECT NOTES COMPUT SC, V7658, P467, DOI 10.1007/978-3-642-34961-4_29
[10]   Low-Latency Elliptic Curve Scalar Multiplication [J].
Bos, Joppe W. .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2012, 40 (05) :532-550