Efficient Fine Tuning for Fashion Object Detection

被引:3
作者
Ma, Benjiang [1 ]
Xu, Wenjin [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
关键词
object detection; fine tuning; clothing dataset;
D O I
10.3390/s23136083
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Pre-trained models have achieved success in object detection. However, challenges remain due to dataset noise and lack of domain-specific data, resulting in weaker zero-shot capabilities in specialized fields such as fashion imaging. We addressed this by constructing a novel clothing object detection benchmark, Garment40K, which includes more than 140,000 human images with bounding boxes and over 40,000 clothing images. Each clothing item within this dataset is accompanied by its corresponding category and textual description. The dataset covers 2 major categories, pants and tops, which are further divided into 15 fine-grained subclasses, providing a rich and high-quality clothing resource. Leveraging this dataset, we propose an efficient fine-tuning method based on the Grounding DINO framework to tackle the issue of missed and false detections of clothing targets. This method incorporates additional similarity loss constraints and adapter modules, leading to a significantly enhanced model named Improved Grounding DINO. By fine-tuning only a small number of additional adapter module parameters, we considerably reduced computational costs while achieving performance comparable to full parameter fine tuning. This allows our model to be conveniently deployed on a variety of low-cost visual sensors. Our Improved Grounding DINO demonstrates considerable performance improvements in computer vision applications in the clothing domain.
引用
收藏
页数:17
相关论文
共 29 条
[1]  
Bahng H, 2022, Arxiv, DOI arXiv:2203.17274
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chen SF, 2022, Arxiv, DOI arXiv:2205.13535
[4]   VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization [J].
Choi, Seunghwan ;
Park, Sunghyun ;
Lee, Minsoo ;
Choo, Jaegul .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14126-14135
[5]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[6]  
Gao Y., 2023, P 2023 INT C LEARNIN
[7]  
Girshick R., 2023, Electronics, P580
[8]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[9]  
Gu X., 2021, arXiv
[10]   LVIS: A Dataset for Large Vocabulary Instance Segmentation [J].
Gupta, Agrim ;
Dollar, Piotr ;
Girshick, Ross .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5351-5359