Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks

被引:3
作者
Chen, Dong [1 ]
Miao, Duoqian [2 ]
Zhao, Xuerong [3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
[2] Tongji Univ, Shanghai 200092, Peoples R China
[3] Shanghai Normal Univ, Comp Sci & Technol Sch, Shanghai 201418, Peoples R China
关键词
Convolutional neural network (CNN); hybrid network; object detection; transformer;
D O I
10.1109/TII.2024.3367043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we point out that the essential differences between convolutional neural network (CNN)-based and transformer-based detectors, which cause worse performance of small object in transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision transformer, called Hybrid Network Transformer (Hyneter), after preexperiments that indicate the gap causes CNN-based and transformer-based methods to increase size-different objects results unevenly. Different from the divide-and-conquer strategy in previous methods, Hyneters consist of hybrid network backbone (HNB) and dual switching (DS) module, which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into transformer blocks in parallel, and DS adjusts excessive reliance on global dependencies outside the patch. Ablation studies illustrate that Hyneters achieve the state-of-the-art performance by a large margin of +2.1 similar to 13.2AP on COCO, and +3.1 similar to 6.5mIoU on VisDrone with lighter model size and lower computational cost in object detection. Furthermore, Hyneters achieve the state-of-the-art results on multiple computer vision tasks, such as object detection ( 60.1AP on COCO and 46.1AP on VisDrone), semantic segmentation ( 54.3AP on ADE20K), and instance segmentation ( 48.5AP(mask) on COCO), and surpass previous best methods. The code will be publicly available later.
引用
收藏
页码:8773 / 8785
页数:13
相关论文
共 53 条
  • [1] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [2] VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results
    Cao, Yaru
    He, Zhijian
    Wang, Lujia
    Wang, Wenguan
    Yuan, Yixuan
    Zhang, Dingwen
    Zhang, Jinglin
    Zhu, Pengfei
    Van Gool, Luc
    Han, Junwei
    Hoi, Steven
    Hu, Qinghua
    Liu, Ming
    Cheng, Chong
    Liu, Fanfan
    Cao, Guojin
    Li, Guozhen
    Wang, Hongkai
    He, Jianye
    Wan, Junfeng
    Wan, Qi
    Zhao, Qi
    Lyu, Shuchang
    Zhao, Wenzhe
    Lu, Xiaoqiang
    Zhu, Xingkui
    Liu, Yingjie
    Lv, Yixuan
    Ma, Yujing
    Yang, Yuting
    Wang, Zhe
    Xu, Zhenyu
    Luo, Zhipeng
    Zhang, Zhimin
    Zhang, Zhiguang
    Li, Zihao
    Zhang, Zixiao
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2847 - 2854
  • [3] Carion N., 2020, ECCV
  • [4] MixFormer: Mixing Features acrossWindows and Dimensions
    Chen, Qiang
    Wu, Qiman
    Wang, Jian
    Hu, Qinghao
    Hu, Tao
    Ding, Errui
    Cheng, Jian
    Wang, Jingdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5239 - 5249
  • [5] You Only Look One-level Feature
    Chen, Qiang
    Wang, Yingming
    Yang, Tong
    Zhang, Xiangyu
    Cheng, Jian
    Sun, Jian
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13034 - 13043
  • [6] Mobile-Former: Bridging MobileNet and Transformer
    Chen, Yinpeng
    Dai, Xiyang
    Chen, Dongdong
    Liu, Mengchen
    Dong, Xiaoyi
    Yuan, Lu
    Liu, Zicheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
  • [7] Chu XX, 2021, ADV NEUR IN
  • [8] Dynamic Head: Unifying Object Detection Heads with Attentions
    Dai, Xiyang
    Chen, Yinpeng
    Xiao, Bin
    Chen, Dongdong
    Liu, Mengchen
    Yuan, Lu
    Zhang, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7369 - 7378
  • [9] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1601 - 1610
  • [10] Control Distance IoU and Control Distance IoU Loss for Better Bounding Box Regression
    Dong, Chen
    Miao, Duoqian
    [J]. PATTERN RECOGNITION, 2023, 137