On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

被引:0
作者
Atharva Gundawar
Srishti Lodha
V. Vijayarajan
Balaji Iyer
V. B. Surya Prasath
机构
[1] Vellore Institute of Technology Vellore,School of Computer Science and Engineering
[2] Cincinnati Children’s Hospital Medical Center,Division of Biomedical Informatics
[3] University of Cincinnati,Departments of Pediatrics, Biomedical Informatics, Computer Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Dense layer; Neuron; Linear function; Quadratic function; Higher-order functions;
D O I
暂无
中图分类号
学科分类号
摘要
Over the past few decades, a lot of new neural network architectures and deep learning (DL)-based models have been developed to tackle problems more efficiently, rapidly, and accurately. For classification problems, it is typical to utilize fully connected layers as the network head. These dense layers used in such architectures have always remained the same – they use a linear transformation function that is a sum of the product of output vectors with weight vectors, and a trainable linear bias. In this study, we explore a different mechanism for the computation of a neuron’s output. By adding a new feature, involving a product of higher order output vectors with their respective weight vectors, we transform the conventional linear function to higher order functions, involving powers over two. We compare and analyze the results obtained from six different transformation functions in terms of training and validation accuracies, on a custom neural network architecture, and with two benchmark datasets for image classification (CIFAR-10 and CIFAR-100). While the dense layers perform better in all epochs with the new functions, the best performance is observed with a quadratic transformation function. Although the final accuracy achieved by the existing and new models remain the same, initial convergence to higher accuracies is always much faster in the proposed approach, thus significantly reducing the computational time and the computational resources required. This model can improve the performance of every DL architecture that uses a dense layer, with remarkably higher improvement in larger architectures that incorporate a very high number of parameters and output classes.
引用
收藏
页码:10655 / 10668
页数:13
相关论文
共 34 条
[1]  
LeCun Y(2015)Dropout: a simple way to prevent neural networks from overfitting "Deep Learn " Nat 521 436-444
[2]  
Bengio Y(2014)Methods for pruning deep neural networks J Mach Learn Res 15 1929-1958
[3]  
Hinton G(2022)Learning long-term dependencies with gradient descent is difficult IEEE Access 10 63280-63300
[4]  
Srivastava N(1994)Learning, invariance, and generalization in high-order neural networks IEEE Trans Neural Networks 5 157-166
[5]  
Hinton G(1987)Kammen. “Correlations in high dimensional or asymmetric data sets: hebbian neuronal processing Appl Opt 26 4972-4978
[6]  
Krizhevsky A(1991)N enhanced classifier fusion model for classifying biomedical data Neural Netw 4 337-347
[7]  
Sutskever I(2012)An efficient FLANN model with CRO-based gradient descent learning for classification Int J Comput Vis Rob 3 129-137
[8]  
Salakhutdinov R(2016)Quadratic autoencoder (Q-AE) for low-dose CT denoising Int J Bus Inform Syst 21 73-116
[9]  
Vadera S(2019)Probabilistic lower bounds for approximation by shallow perceptron networks IEEE Trans Med Imaging 39 2035-2050
[10]  
Ameen S(2017)undefined Neural Netw 91 34-41