Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引：0

作者：

Alireza Dehghanpour

Javad Khodamoradi Kordestani

Masoud Dehyadegari

机构：

[1] K. N. Toosi University of Technology,Faculty of Computer Engineering

[2] Institute for Research in Fundamental Sciences (IPM),School of Computer Science

来源：

Neural Processing Letters | 2023年 / 55卷

关键词：

Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.

引用

页码：12061 / 12078

页数：17

共 50 条

[41] Low-resource low-latency hybrid adaptive CORDIC with floating-point precision
Hong-Thu Nguyen
Xuan-Thuan Nguyen
Trong-Thuc Hoang
Duc-Hung Le
Cong-Kha Pham
IEICE ELECTRONICS EXPRESS, 2015, 12 (09):
[42] A new floating-point adder FPGA-based implementation using RN-coding of numbers
Araujo, Tulio
Cardoso, Matheus B. R.
Nepomuceno, Erivelton G.
Llanos, Carlos H.
Arias-Garcia, Janier
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
[43] FP-BMAC: Efficient Approximate Floating-Point Bit-Parallel MAC Processor using IMC
Gajawada, Saketh
Gupta, Aryan
Prasad, Kailash
Mekie, Joycee
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 241 - 246
[44] A Low-Cost High Radix Floating-Point Square-Root Circuit
Yang, Yuheng
Yuan, Qing
Liu, Jian
ELECTRONICS, 2021, 10 (16)
[45] Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers
Lee, Janghwan
Choi, Jungwook
2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 98 - 101
[46] Optimization of an Arithmetic Logic Unit using Generic Floating Point Algorithm for 12-Bit Architecture
Anacan, Rommel M.
PROCEEDINGS 5TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2015), 2015, : 24 - 29
[47] Low-Cost Concurrent Error Detection for Floating-Point Unit (FPU) Controllers
Maniatakos, Michail
Kudva, Prabhakar
Fleischer, Bruce M.
Makris, Yiorgos
IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (07) : 1376 - 1388
[48] A precision- and range-independent tool for testing floating-point arithmetic I: Basic operations, square root, and remainder
Verdonk, B
Cuyt, A
Verschaeren, D
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2001, 27 (01): : 92 - 118
[49] Design and analysis of evolutionary bit-length optimization algorithms for floating to fixed-point conversion
Rosa, L. S.
Delbem, A. C. B.
Toledo, C. F. M.
Bonato, V.
APPLIED SOFT COMPUTING, 2016, 49 : 447 - 461
[50] Low-Cost High-Precision Architecture for Arbitrary Floating-Point Nth Root Computation
Hong, Wanyuan
Chen, Hui
Quan, Lianghua
Fu, Yuxiang
Li, Li
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,

← 1 2 3 4 5 →