CenterNet plus plus for Object Detection

被引:25
作者
Duan, Kaiwen [1 ]
Bai, Song [2 ]
Xie, Lingxi [3 ]
Qi, Honggang [1 ]
Huang, Qingming [1 ]
Tian, Qi [3 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
[2] Univ Oxford, Oxford OX1 2JD, Oxfordshire, England
[3] Huawei Inc, Shenzhen 518129, Peoples R China
基金
中国国家自然科学基金;
关键词
Anchor-free; bottom-up; deep learning; object detection;
D O I
10.1109/TPAMI.2023.3342120
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art approaches are mainly top-down methods. In this paper, we demonstrate that bottom-up approaches show competitive performance compared with top-down approaches and have higher recall rates. Our approach, named CenterNet, detects each object as a triplet of keypoints (top-left and bottom-right corners and the center keypoint). We first group the corners according to some designed cues and confirm the object locations based on the center keypoints. The corner keypoints allow the approach to detect objects of various scales and shapes and the center keypoint reduces the confusion introduced by a large number of false-positive proposals. Our approach is an anchor-free detector because it does not need to define explicit anchor boxes. We adapt our approach to backbones with different structures, including 'hourglass'-like networks and 'pyramid'-like networks, which detect objects in single-resolution and multi-resolution feature maps, respectively. On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieve average precisions (APs) of 53.7% and 57.1%, respectively, outperforming all existing bottom-up detectors and achieving state-of-the-art performance. We also design a real-time CenterNet model, which achieves a good trade-off between accuracy and speed, with an AP of 43.6% at 30.5 frames per second (FPS).
引用
收藏
页码:3509 / 3521
页数:13
相关论文
共 72 条
[1]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[2]  
Bochkovskiy A., 2020, PREPRINT
[3]   Soft-NMS - Improving Object Detection With One Line of Code [J].
Bodla, Navaneeth ;
Singh, Bharat ;
Chellappa, Rama ;
Davis, Larry S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570
[4]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[5]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]   BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [J].
Chen, Hao ;
Sun, Kunyang ;
Tian, Zhi ;
Shen, Chunhua ;
Huang, Yongming ;
Yan, Youliang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8570-8578
[8]  
Chen Yu, 2020, ADV NEURAL INFORM PR, V33
[9]  
Chen YT, 2019, Arxiv, DOI arXiv:1908.01570
[10]  
Dai JF, 2016, ADV NEUR IN, V29