Hardware-Aware and Efficient Feature Fusion Network Search

被引:0
|
作者
Guo J.-M. [1 ,2 ,3 ]
Zhang R. [1 ,2 ]
Zhi T. [1 ]
He D.-Y. [2 ]
Huang D. [1 ,2 ,3 ]
Chang M. [2 ]
Zhang X.-S. [1 ,2 ]
Guo Q. [1 ]
机构
[1] State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] Cambricon Technologies, Beijing
[3] University of Chinese Academy of Sciences, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 11期
关键词
Hardware overhead; Neural architecture search; Object detection;
D O I
10.11897/SP.J.1016.2022.02420
中图分类号
学科分类号
摘要
Fusion network is a representative module in object detection frameworks to fuse multi-scale features and improve detection accuracy. Previous works of designing fusion network architecture mainly focus on designing the topology of the fusion path to improve the performance of object detection. However, the required hardware resource overhead and the influence of feature selection and feature fusion operations on the detection performance are ignored. In this paper, we propose a feature fusion network named Attention-aware Fusion Network(AFN), which has a strong capacity of fusing multi-scale features for object detection. Through software and hardware cooperation, it can realize the automatic search of the neural network sensitive to hardware cost (parameter storage, calculation time, etc.), and realize the integrated optimization deployment from the three aspects of the fusion network's characteristics, paths and operations. In this paper, we first summarize and propose three key factors that should be considered in the design of feature fusion network: fusion feature selection, fusion path and fusion mode. We also need to consider the hardware overhead of deploying the algorithm to the target platform. However, these design factors compose a huge design space that contains tremendous amounts of design choices. Thus, manually designing the optimal architecture of the fusion neck is very difficult. We employ neural network search(NAS) method to automatically design the feature fusion network. We propose three kind of search unit: feature search unit, fusion path search unit and fusion mode search unit. The feature search unit aims to search for the most appropriate input features for each scale instead of fixing from the top layer of each stage. The fusion path search unit takes all possible cross-scale fusing connections among groups as the search space and search for the optimal connection. the fusion mode search unit contains a variety of candidate fusion operations and decide operations to fuse features of multiple scales. Particularly, this unit is attention-aware by utilizing different kinds of attention mechanisms along with the commonly used add operation as candidate fusion operations. We use NAS algorithm based on evolutionary algorithm, and realize weight reuse and grouping fusion when designing search unit, which reduce the computational cost and memory cost. We also take the hardware cost of feature fusion network on the target hardware as the search target, so we can achieve a good trade-off between precision and computational cost on the target hardware. We evaluate our method on the famous detection dataset COCO and compare the searched feature fusion network with several advanced feature fusion network and give the results of detection precision and the network complexity. We also carry out experiments to evaluate our special design choice of the search units and the search object. The experiment results shows that when the backbone is ResNet50, compared with the existing searched network NAS-FPN, the parameter amount and calculation amount are reduced by 29.6% and 22.3% respectively when achieving similar detection accuracy. Compared with the existing artificially designed network FPN, AP can increase by 2.1%. When the backbone network is VGG, compared to the searched network Auto-FPN, AP can increase by 1.7%. © 2022, Science Press. All right reserved.
引用
收藏
页码:2420 / 2432
页数:12
相关论文
共 38 条
  • [1] Girshick R., Fast R-CNN, Proceedings of the IEEE International Conference on Ccomputer Vision, pp. 1440-1448, (2015)
  • [2] Ren S, He K, Girshick R, Et al., Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, pp. 91-99, (2015)
  • [3] Liu W, Anguelov D, Erhan D, Et al., SSD: Single shot MultiBox detector, Proceedings of the European Conference on Computer Vision, pp. 21-37, (2016)
  • [4] Lin T Y, Goyal P, Girshick R, Et al., Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, (2017)
  • [5] Girshick R, Donahue J, Darrell T, Et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [6] Lin T Y, Dollar P, Girshick R, Et al., Feature pyramid networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, (2017)
  • [7] Tan M, Pang R, Le Q V., EfficientDet: Scalable and efficient object detection, CoRR, (2019)
  • [8] Liu S, Qi L, Qin H, Et al., Path aggregation network for instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, (2018)
  • [9] Ghiasi G, Lin T Y, Le Q V., NAS-FPN: Learning scalable feature pyramid architecture for object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036-7045, (2019)
  • [10] Xu H, Yao L, Zhang W, Et al., Auto-FPN: Automatic network architecture adaptation for object detection beyond classification, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6649-6658, (2019)