Plain-Det: A Plain Multi-dataset Object Detector

被引：0

作者：

Shi, Cheng ^{[1
]}

Zhu, Yuchen ^{[1
]}

Yang, Sibei ^{[1
]}

机构：

[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT V | 2025年 / 15063卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Multiple datasets; Proposal generation;

D O I：

10.1007/978-3-031-72652-1_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advancements in large-scale foundational models have sparked widespread interest in training highly proficient large vision models. A common consensus revolves around the necessity of aggregating extensive, high-quality annotated data. However, given the inherent challenges in annotating dense tasks in computer vision, such as object detection and segmentation, a practical strategy is to combine and leverage all available data for training purposes. In this work, we propose Plain-Det, which offers flexibility to accommodate new datasets, robustness in performance across diverse datasets, training efficiency, and compatibility with various detection architectures. We utilize Def-DETR, with the assistance of Plain-Det, to achieve a mAP of 51.9 on COCO, matching the current state-of-the-art detectors. We conduct extensive experiments on 13 downstream datasets and Plain-Det demonstrates strong generalization capability. Code is release at https://github.com/ChengShiest/Plain-Det.

引用

页码：210 / 226

页数：17

共 45 条

[1] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[2] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [J].

Changpinyo, Soravit ;

Sharma, Piyush ;

Ding, Nan ;

Soricut, Radu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3557-3567

[3]

Chen Q, 2023, Arxiv, DOI arXiv:2207.13085

[4] ScaleDet: A Scalable Multi-Dataset Object Detector [J].

Chen, Yanbei ;

Wang, Manchen ;

Mittal, Abhay ;

Xu, Zhenlin ;

Favaro, Paolo ;

Tighe, Joseph ;

Modolo, Davide .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :7288-7297

[5] Dynamic Convolution: Attention over Convolution Kernels [J].

Chen, Yinpeng ;

Dai, Xiyang ;

Liu, Mengchen ;

Chen, Dongdong ;

Yuan, Lu ;

Liu, Zicheng .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036

[6]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[7] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[8]

Gu X., 2021, arXiv

[9] LVIS: A Dataset for Large Vocabulary Instance Segmentation [J].

Gupta, Agrim ;

Dollar, Piotr ;

Girshick, Ross .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5351-5359

[10]

He KM, 2018, Arxiv, DOI [arXiv:1703.06870, 10.48550/arXiv.1703.06870, DOI 10.48550/ARXIV.1703.06870]

← 1 2 3 4 5 →