DDOD: Dive Deeper into the Disentanglement of Object Detector

被引:13
作者
Chen, Zehui [1 ]
Yang, Chenhongyi [2 ]
Chang, Jiahao [1 ]
Zhao, Feng [1 ]
Zha, Zheng-Jun [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China
[2] Univ Edinburgh, Sch Engn, Edinburgh EH8 9YL, Scotland
关键词
Disentanglement; feature representation learning; imbalance learning; object detection;
D O I
10.1109/TMM.2023.3264008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Compared to many other dense prediction tasks, object detection plays a fundamental role in visual perception and scene understanding. Dense object detection, aiming at localizing objects directly from the feature map, has drawn great attention due to its low cost and high efficiency. Though it has been developed for a long time, the training pipeline of dense object detectors is still compromised to lots of conjunctions. In this paper, we demonstrate the existence of three conjunctions lying in the current paradigm of one-stage detectors: 1) only samples assigned as positive in classification head are used to train the regression head; 2) classification and regression share the same input feature and computational fields defined by the parallel head architecture; and 3) samples distributed in different feature pyramid layers are treated equally when computing the loss. Based on this, we propose Disentangled Dense Object Detector (DDOD), a simple, direct, and efficient framework for 2D detection with strong performance. We derive two DDOD variants (i.e., DR-CNN, and DDETR) following the basic one-stage/two-stage and recently developed transformer-based pipelines. Specifically, we develop three effective disentanglement mechanisms and integrate them into the current state-of-the-art object detectors. Extensive experiments on MS COCO benchmark show that our approach obtains significant enhancements with negligible extra overhead on various detectors. Notably, our best model reaches 55.4 mAP on the COCO test-dev set, achieving new state-of-the-art performance on this competitive benchmark. Additionally, we validate our model on several challenging tasks including small object detection and crowded object detection. The experimental results further prove the superiority of disentanglement on these conjunctions.
引用
收藏
页码:284 / 298
页数:15
相关论文
共 75 条
[1]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[2]  
Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3]  
Chen K, 2019, Arxiv, DOI [arXiv:1906.07155, DOI 10.48550/ARXIV.1906.07155]
[4]   Hybrid Task Cascade for Instance Segmentation [J].
Chen, Kai ;
Pang, Jiangmiao ;
Wang, Jiaqi ;
Xiong, Yu ;
Li, Xiaoxiao ;
Sun, Shuyang ;
Feng, Wansen ;
Liu, Ziwei ;
Shi, Jianping ;
Ouyang, Wanli ;
Loy, Chen Change ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978
[5]  
Chen Y., 2020, P ADV NEUR INF PROC, P1
[6]   Disentangle Your Dense Object Detector [J].
Chen, Zehui ;
Yang, Chenhongyi ;
Li, Qiaofei ;
Zhao, Feng ;
Zha, Zheng-Jun ;
Wu, Feng .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4939-4948
[7]  
Chen ZH, 2022, Arxiv, DOI arXiv:2211.09386
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]   Dynamic Head: Unifying Object Detection Heads with Attentions [J].
Dai, Xiyang ;
Chen, Yinpeng ;
Xiao, Bin ;
Chen, Dongdong ;
Liu, Mengchen ;
Yuan, Lu ;
Zhang, Lei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7369-7378
[10]   Extended Feature Pyramid Network for Small Object Detection [J].
Deng, Chunfang ;
Wang, Mengmeng ;
Liu, Liang ;
Liu, Yong ;
Jiang, Yunliang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :1968-1979