ShuffleNeMt: modern lightweight convolutional neural network architecture

被引:0
作者
Zhu, Meng [1 ]
Min, Weidong [1 ,2 ,3 ]
Han, Qing [1 ,2 ,3 ]
Zhan, Guowei [1 ]
Fu, Qiyan [1 ]
Li, Jiahao [1 ]
机构
[1] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Peoples R China
[2] Nanchang Univ, Inst Metaverse, Nanchang 330031, Peoples R China
[3] Nanchang Univ, Jiangxi Prov Key Lab Virtual Real, Nanchang 330031, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Self-attention; Lightweight;
D O I
10.1007/s10044-024-01327-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lightweight convolutional neural networks (CNNs) are specifically designed for mobile devices and embedded systems, being aimed at achieving lower resource and faster inference speed. However, these networks are limited in their ability to capture long-range dependencies due to the local nature of convolutional operations. The introduction of self-attention into these CNNs can effectively capture global information, but it comes at the cost of significantly slower inference speed. To solve these problems, we propose a novel lightweight network called as ShuffleNeMt. Our ShuffleNeMt involves modernizing ShuffleNetV2 by incorporating several strategies, including pre-norm residual learning, scaling the residual depth, visual self-attention and non-monotonic activation functions. The visual self-attention is used for only one layer and positioned in the middle of the network. Using this way, ShuffleNeMt not only can achieve efficient resource utilization and fast inference speed, but also can capture long-range spatial dependencies. Extensive experimental results demonstrate the superiority of ShuffleNeMt over the existing lightweight architectures. E.g., On the CIFAR-100 image classification dataset, ShuffleNeMt-1.5 achieved top-1 error rate of 34.05%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$34.05\%$$\end{document} using 2.588M parameters, which is better than the top-1 error rate of 36.86%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$36.86\%$$\end{document} achieved by MobileNetV3-Large-0.8 using 2.809M parameters.
引用
收藏
页数:11
相关论文
共 44 条
  • [1] Ba J. L., 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450
  • [2] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [3] Mean and variance of truncated normal distributions
    Barr, DR
    Sherrill, ET
    [J]. AMERICAN STATISTICIAN, 1999, 53 (04) : 357 - 361
  • [4] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [5] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [7] Everingham Mark, 2011, Tech Rep 8
  • [8] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]
  • [9] Glorot X., 2010, P 13 INT C ART INT S, DOI DOI 10.1109/LGRS.2016.2565705
  • [10] Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues
    Gupta, Abhishek
    Anpalagan, Alagan
    Guan, Ling
    Khwaja, Ahmed Shaharyar
    [J]. ARRAY, 2021, 10