ShuffleNeMt: modern lightweight convolutional neural network architecture

被引：0

作者：

Zhu, Meng ^{[1
]}

Min, Weidong ^{[1
,2
,3
]}

Han, Qing ^{[1
,2
,3
]}

Zhan, Guowei ^{[1
]}

Fu, Qiyan ^{[1
]}

Li, Jiahao ^{[1
]}

机构：

[1] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Peoples R China

[2] Nanchang Univ, Inst Metaverse, Nanchang 330031, Peoples R China

[3] Nanchang Univ, Jiangxi Prov Key Lab Virtual Real, Nanchang 330031, Peoples R China

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2024年 / 27卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Self-attention; Lightweight;

D O I：

10.1007/s10044-024-01327-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Lightweight convolutional neural networks (CNNs) are specifically designed for mobile devices and embedded systems, being aimed at achieving lower resource and faster inference speed. However, these networks are limited in their ability to capture long-range dependencies due to the local nature of convolutional operations. The introduction of self-attention into these CNNs can effectively capture global information, but it comes at the cost of significantly slower inference speed. To solve these problems, we propose a novel lightweight network called as ShuffleNeMt. Our ShuffleNeMt involves modernizing ShuffleNetV2 by incorporating several strategies, including pre-norm residual learning, scaling the residual depth, visual self-attention and non-monotonic activation functions. The visual self-attention is used for only one layer and positioned in the middle of the network. Using this way, ShuffleNeMt not only can achieve efficient resource utilization and fast inference speed, but also can capture long-range spatial dependencies. Extensive experimental results demonstrate the superiority of ShuffleNeMt over the existing lightweight architectures. E.g., On the CIFAR-100 image classification dataset, ShuffleNeMt-1.5 achieved top-1 error rate of 34.05%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$34.05\%$$\end{document} using 2.588M parameters, which is better than the top-1 error rate of 36.86%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$36.86\%$$\end{document} achieved by MobileNetV3-Large-0.8 using 2.809M parameters.

引用

页数：11

共 44 条

[1] Ba J. L., 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450
[2] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[3] Mean and variance of truncated normal distributions
Barr, DR
Sherrill, ET
[J]. AMERICAN STATISTICIAN, 1999, 53 (04) : 357 - 361
[4] Xception: Deep Learning with Depthwise Separable Convolutions
Chollet, Francois
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
[5] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[7] Everingham Mark, 2011, Tech Rep 8
[8] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]
[9] Glorot X., 2010, P 13 INT C ART INT S, DOI DOI 10.1109/LGRS.2016.2565705
[10] Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues
Gupta, Abhishek
Anpalagan, Alagan
Guan, Ling
Khwaja, Ahmed Shaharyar
[J]. ARRAY, 2021, 10

← 1 2 3 4 5 →