Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引：0

作者：

Alireza Dehghanpour

Javad Khodamoradi Kordestani

Masoud Dehyadegari

机构：

[1] K. N. Toosi University of Technology,Faculty of Computer Engineering

[2] Institute for Research in Fundamental Sciences (IPM),School of Computer Science

来源：

Neural Processing Letters | 2023年 / 55卷

关键词：

Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.

引用

页码：12061 / 12078

页数：17

共 50 条

[1] Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers
Dehghanpour, Alireza
Kordestani, Javad Khodamoradi
Dehyadegari, Masoud
NEURAL PROCESSING LETTERS, 2023, 55 (09) : 12061 - 12078
[2] Arithmetic Coding for Floating-Point Numbers
Fischer, Marc
Riedel, Oliver
Lechler, Armin
Verl, Alexander
2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
[3] Accurate Floating-point Operation using Controlled Floating-point Precision
Zaki, Ahmad M.
Bahaa-Eldin, Ayman M.
El-Shafey, Mohamed H.
Aly, Gamal M.
2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 696 - 701
[4] Unum: Adaptive Floating-Point Arithmetic
Morancho, Enric
19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016), 2016, : 651 - 656
[5] ARE IEEE 754 32-BIT AND 64-BIT BINARY FLOATING-POINT ACCURATE ENOUGH?
Hutabarat, Bernaridho
Purnama, I. Ketut Eddy
Hariadi, Mochamad
Purnomo, Mauridhi Hery
MAKARA JOURNAL OF TECHNOLOGY, 2011, 15 (01): : 68 - 74
[6] Verifying Bit-Manipulations of Floating-Point
Lee, Wonyeol
Sharma, Rahul
Aiken, Alex
ACM SIGPLAN NOTICES, 2016, 51 (06) : 70 - 84
[7] Accurate ICP-based Floating-Point Reasoning
Scheibler, Karsten
Neubauer, Felix
Mahdi, Ahmed
Fraenzle, Martin
Teige, Tino
Bienmueller, Tom
Fehrer, Detlef
Becker, Bernd
PROCEEDINGS OF THE 2016 16TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD 2016), 2016, : 177 - 184
[8] Sabrewing: A Lightweight Architecture for Combined Floating-Point and Integer Arithmetic
Bruintjes, Tom M.
Walters, Karel H. G.
Gerez, Sabih H.
Molenkamp, Bert
Smit, Gerard J. M.
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
[9] Building Better Bit-Blasting for Floating-Point Problems
Brain, Martin
Schanda, Florian
Sun, Youcheng
TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, 2019, 11427 : 79 - 98
[10] The design of a32bit floating-point RISC microprocessor
Qian, G
Li, L
Shen, XB
Xu, Q
Zhao, N
2001 4TH INTERNATIONAL CONFERENCE ON ASIC PROCEEDINGS, 2001, : 760 - 764

← 1 2 3 4 5 →