Efficient GPU Implementation of Multiple-Precision Addition based on Residue Arithmetic

被引:0
作者
Isupov, Konstantin [1 ]
Knyazkov, Vladimir [2 ]
机构
[1] Vyatka State Univ, Dept Elect Comp Machines, Kirov 610000, Russia
[2] Penza State Univ, Res Inst Fundamental & Appl Studies, Penza 440026, Russia
关键词
Multiple-precision algorithm; integer arithmetic; residue number system; GPU; CUDA;
D O I
10.14569/IJACSA.2020.0110901
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work, the residue number system (RNS) is applied for efficient addition of multiple-precision integers using graphics processing units (GPUs) that support the Compute Unified Device Architecture (CUDA) platform. The RNS allows calculations with the digits of a multiple-precision number to be performed in an element-wise fashion, without the overhead of communication between them, which is especially useful for massively parallel architectures such as the GPU architecture. The paper discusses two multiple-precision integer algorithms. The first algorithm relies on if-else statements to test the signs of the operands. In turn, the second algorithm uses radix complement RNS arithmetic to handle negative numbers. While the first algorithm is more straightforward, the second one avoids branch divergence among threads that concurrently compute different elements of a multiple-precision array. As a result, the second algorithm shows significantly better performance compared to the first algorithm. Both algorithms running on an NVIDIA RTX 2080 Ti GPU are faster than the multi-core GNU MP implementation running on an Intel Xeon 4100 processor.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 37 条
  • [31] Scalable and Efficient Spatial Data Management on Multi-Core CPU and GPU Clusters: A Preliminary Implementation based on Impala
    You, Simin
    Zhang, Jianting
    Gruenwald, Le
    2015 13TH IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2015, : 143 - 148
  • [32] VLSI implementation of residue number system based efficient digital signal processor architecture for wireless sensor nodes
    Ananthalakshmi A.V.
    Rajagopalan P.
    International Journal of Information Technology, 2019, 11 (4) : 829 - 840
  • [33] An Efficient Implementation of the CRT Algorithm Based on an Interval-Index Characteristic and Minimum-Redundancy Residue Code
    Selianinau, Mikhail
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2020, 17 (10)
  • [34] An Efficient Dynamic Multiple-Candidate Motion Vector Approach for GPU-based Hierarchical Motion Estimation
    Vu, Dung
    Yang, Yang
    Bhuyan, Laxmi
    2012 IEEE 31ST INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2012, : 342 - 351
  • [35] 3D Alternating Direction TV-Based Cone-Beam CT Reconstruction with Efficient GPU Implementation
    Cai, Ailong
    Wang, Linyuan
    Zhang, Hanming
    Yan, Bin
    Li, Lei
    Xi, Xiaoqi
    Guan, Min
    Li, Jianxin
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014
  • [36] A multiple-data-based efficient global optimization algorithm and its parallel implementation for automotive body design
    Xu, Bing
    Cai, Yong
    ADVANCES IN MECHANICAL ENGINEERING, 2018, 10 (08)
  • [37] ADVANCED IMPLEMENTATION OF THE FULL RESOLUTION P-SBAS DINSAR PROCESSING CHAIN BASED ON SCALABLE GPU-PARALLEL TECHNIQUES FOR THE EFFICIENT DEFORMATIONS ANALYSIS OF THE BUILT-UP ENVIRONMENT
    Bonano, Manuela
    Buonanno, Sabatino
    Lanari, Riccardo
    Manunta, Michele
    Striano, Pasquale
    Yasir, Muhammad
    Zinno, Ivana
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1103 - 1106