Leveraging Spatial-semantic Information in Object Detection and Segmentation

被引：0

作者：

Guo Q.-Z. ^{[1
]}

Yuan C. ^{[2
,3
]}

机构：

[1] Department of Computer Science and Technology, Tsinghua University, Beijing

[2] Shenzhen International Graduate School, Tsinghua University, Shenzhen

[3] Pengcheng Laboratory, Shenzhen

来源：

Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 06期

关键词：

attention mechanism; deep learning; feature fusion; image segmentation; object detection;

D O I：

10.13328/j.cnki.jos.006509

中图分类号：

学科分类号：

摘要：

High quality feature representation can boost performance for object detection and other computer vision tasks. Modern object detectors resort to versatile feature pyramids to enrich the representation power but neglect that different fusing operations should be used for pathways of different directions to meet their different needs of information flow. This study proposes separated spatial semantic fusion (SSSF) that uses a channel attention block (CAB) in top-down pathway to pass semantic information and a spatial attention block (SAB) with a bottleneck structure in the bottom-up pathway to pass precise location signals to the top level with fewer parameters and less computation (compared with plain spatial attention without dimension reduction). SSSF is effective and has a great generality ability: It improves AP over 1.3% for object detection, about 0.8% over plain addition for fusing operation of the top-down path for semantic segmentation, and boost the instance segmentation performance in all metrics for both bounding box AP and mask AP. © 2023 Chinese Academy of Sciences. All rights reserved.

引用

页码：2776 / 2788

页数：12

共 52 条

[41] Tian Z, Shen CH, Chen H, He T., FCOS: Fully convolutional one-stage object detection, Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision, pp. 9626-9635, (2019)
[42] Kong T, Sun FC, Liu HP, Jiang YN, Li L, Shi JB., FoveaBox: Beyound anchor-based object detection, IEEE Trans. on Image Processing, 29, pp. 7389-7398, (2020)
[43] Zhang XS, Wan F, Liu C, Ji XY, Ye QX., FreeAnchor: learning to match anchors for visual object detection, (2019)
[44] Chen K, Wang JQ, Pang JM, Et al., MMDetection: Open MMLab detection toolbox and benchmark, (2019)
[45] Goyal P, Dollar P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia YQ, He KM., Accurate, large minibatch SGD: Training ImageNet in 1 hour, (2018)
[46] Xu B, Wang NY, Chen TQ, Li M., Empirical evaluation of rectified activations in convolutional network, (2015)
[47] Nair V, Hinton GE., Rectified linear units improve restricted boltzmann machines, Proc. of the 27th Int’l Conf. on Machine Learning, pp. 807-814, (2010)
[48] Zhang SF, Chi C, Yao YQ, Lei Z, Li SZ., Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection, Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 9756-9765, (2020)
[49] Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B., The cityscapes dataset for semantic urban scene understanding, Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3213-3223, (2016)
[50] Everingham M, van Gool L, Williams CKI, Winn J, Zisserman A., The PASCAL visual object classes (VOC) challenge, Int’l Journal of Computer Vision, 88, 2, pp. 303-338, (2010)

← 1 2 3 4 5 6 →