PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

被引:12
作者
Chen, Honghao [1 ,2 ,5 ]
Chu, Xiangxiang [3 ]
Ren, Yongjian [1 ,2 ]
Zhao, Xin [1 ,2 ]
Huang, Kaiqi [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Meituan, Beijing, Peoples R China
[4] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[5] Meituan Inc, Beijing, Peoples R China
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52733.2024.00531
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some large kernel convnets strike back with appealing performance and efficiency. However, given the square complexity of convolution, scaling up kernels can bring about an enormous amount of parameters and the proliferated parameters can induce severe optimization problem. Due to these issues, current CNNs compromise to scale up to 51 x 51 in the form of stripe convolution ( i.e., 51 x 5 + 5 x 51) and start to saturate as the kernel size continues growing. In this paper, we delve into addressing these vital issues and explore whether we can continue scaling up kernels for more performance gains. Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution through parameter sharing, and manage to scale up kernel size to extremely large. Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K-2) to O(log K) without backfiring performance. Built on this, we propose Parameter-efficient Large Kernel Network (PeLK). Our PeLK outperforms modern vision Transformers and ConvNet architectures like Swin, ConvNeXt, RepLKNet and SLaK on various vision tasks including ImageNet classification, semantic segmentation on ADE20K and object detection on MS COCO. For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101 x 101 and demonstrate consistent improvements.
引用
收藏
页码:5557 / 5567
页数:11
相关论文
共 53 条
  • [1] [Anonymous], 2020, Megengine.A Fast, Scalable and Easy-to-use Deep Learning Framework
  • [2] A summary-statistic representation in peripheral vision explains visual crowding
    Balas, Benjamin
    Nakano, Lisa
    Rosenholtz, Ruth
    [J]. JOURNAL OF VISION, 2009, 9 (12):
  • [3] Hybrid Task Cascade for Instance Segmentation
    Chen, Kai
    Pang, Jiangmiao
    Wang, Jiaqi
    Xiong, Yu
    Li, Xiaoxiao
    Sun, Shuyang
    Feng, Wansen
    Liu, Ziwei
    Shi, Jianping
    Ouyang, Wanli
    Loy, Chen Change
    Lin, Dahua
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4969 - 4978
  • [4] Chen Yukang, 2023, P IEEE CVF C COMP VI, P13488
  • [5] Chin Ian Lou, 2012, Brain Informatics. International Conference, BI 2012. Proceedings, P18, DOI 10.1007/978-3-642-35139-6_3
  • [6] Chu X., 2021, P ADV NEUR INF PROC, P9355
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [8] Deza A, 2016, ADV NEUR IN, V29
  • [9] Deza Arturo, 2020, ARXIV
  • [10] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
    Ding, Xiaohan
    Zhang, Xiangyu
    Han, Jungong
    Ding, Guiguang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11953 - 11965