Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引:0
|
作者
Alireza Dehghanpour
Javad Khodamoradi Kordestani
Masoud Dehyadegari
机构
[1] K. N. Toosi University of Technology,Faculty of Computer Engineering
[2] Institute for Research in Fundamental Sciences (IPM),School of Computer Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.
引用
收藏
页码:12061 / 12078
页数:17
相关论文
共 50 条
  • [1] Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers
    Dehghanpour, Alireza
    Kordestani, Javad Khodamoradi
    Dehyadegari, Masoud
    NEURAL PROCESSING LETTERS, 2023, 55 (09) : 12061 - 12078
  • [2] Arithmetic Coding for Floating-Point Numbers
    Fischer, Marc
    Riedel, Oliver
    Lechler, Armin
    Verl, Alexander
    2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
  • [3] Accurate Floating-point Operation using Controlled Floating-point Precision
    Zaki, Ahmad M.
    Bahaa-Eldin, Ayman M.
    El-Shafey, Mohamed H.
    Aly, Gamal M.
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 696 - 701
  • [4] Unum: Adaptive Floating-Point Arithmetic
    Morancho, Enric
    19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), 2016, : 651 - 656
  • [5] ARE IEEE 754 32-BIT AND 64-BIT BINARY FLOATING-POINT ACCURATE ENOUGH?
    Hutabarat, Bernaridho
    Purnama, I. Ketut Eddy
    Hariadi, Mochamad
    Purnomo, Mauridhi Hery
    MAKARA JOURNAL OF TECHNOLOGY, 2011, 15 (01): : 68 - 74
  • [6] Verifying Bit-Manipulations of Floating-Point
    Lee, Wonyeol
    Sharma, Rahul
    Aiken, Alex
    ACM SIGPLAN NOTICES, 2016, 51 (06) : 70 - 84
  • [7] Accurate ICP-based Floating-Point Reasoning
    Scheibler, Karsten
    Neubauer, Felix
    Mahdi, Ahmed
    Fraenzle, Martin
    Teige, Tino
    Bienmueller, Tom
    Fehrer, Detlef
    Becker, Bernd
    PROCEEDINGS OF THE 2016 16TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD 2016), 2016, : 177 - 184
  • [8] Sabrewing: A Lightweight Architecture for Combined Floating-Point and Integer Arithmetic
    Bruintjes, Tom M.
    Walters, Karel H. G.
    Gerez, Sabih H.
    Molenkamp, Bert
    Smit, Gerard J. M.
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [9] Building Better Bit-Blasting for Floating-Point Problems
    Brain, Martin
    Schanda, Florian
    Sun, Youcheng
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, 2019, 11427 : 79 - 98
  • [10] The design of a32bit floating-point RISC microprocessor
    Qian, G
    Li, L
    Shen, XB
    Xu, Q
    Zhao, N
    2001 4TH INTERNATIONAL CONFERENCE ON ASIC PROCEEDINGS, 2001, : 760 - 764