Optimizing nonlinear activation function for convolutional neural networks

被引:37
作者
Varshney, Munender [1 ]
Singh, Pravendra [1 ]
机构
[1] Indian Inst Technol Kanpur, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India
关键词
FReLU; ReLU; CNN; Convolutional neural network; Activation function;
D O I
10.1007/s11760-021-01863-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Activation functions play a critical role in the training and performance of the deep convolutional neural networks. Currently, the rectified linear unit (ReLU) is the most commonly used activation function for the deep CNNs. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. In this work, we propose a novel approach to generalize the ReLU activation function using multiple learnable slope parameters. These learnable slope parameters are optimized for every channel, which leads to the learning of a more generalized activation function (a variant of ReLU) corresponding to each channel. This activation is named as fully parametric rectified linear unit (FReLU) and trained using an alternate optimization technique by learning one set of parameters, keeping another set of parameters frozen. Our experiments show that the method outperforms ReLU and its other variant activation functions and also generalizes over various tasks such as image classification, object detection and action recognition in videos. The Top-1 classification accuracy of FReLU on ImageNet improves by 3.75% for MobileNet and similar to 2% for ResNet-50 over ReLU. We also provide various analyses for better interpretability of our proposed activation function.
引用
收藏
页码:1323 / 1330
页数:8
相关论文
共 35 条
  • [1] Agostinelli F., 2014, arXiv preprint arXiv:1412.6830
  • [2] Clevert D.-A., 2016, 4 INT C LEARN REPR I
  • [3] Parametric rectified nonlinear unit (PRenu) for convolution neural networks
    El Jaafari, Ilyas
    Ellahyani, Ayoub
    Charfi, Said
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (02) : 241 - 246
  • [4] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
    Elfwing, Stefan
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2018, 107 : 3 - 11
  • [5] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [6] Goodfellow I.J., 2013, MULTIDIGIT NUMBER RE
  • [7] The "something something" video database for learning and evaluating visual common sense
    Goyal, Raghav
    Kahou, Samira Ebrahimi
    Michalski, Vincent
    Materzynska, Joanna
    Westphal, Susanne
    Kim, Heuna
    Haenel, Valentin
    Fruend, Ingo
    Yianilos, Peter
    Mueller-Freitag, Moritz
    Hoppe, Florian
    Thurau, Christian
    Bax, Ingo
    Memisevic, Roland
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851
  • [8] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [9] Hayou S, 2019, PR MACH LEARN RES, V97
  • [10] He K, P IEEE C COMP VIS PA, P770, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]