Boosting grape bunch detection in RGB-D images using zero-shot annotation with Segment Anything and GroundingDINO

被引:1
作者
Devanna, Rosa Pia [1 ]
Reina, Giulio [2 ]
Cheein, Fernando Auat [3 ,4 ]
Milella, Annalisa [1 ]
机构
[1] Natl Res Council Italy CNR, Inst Intelligent Ind Technol & Syst Adv Mfg STIIM, Via Amendola 122 D-O, I-70126 Bari, Italy
[2] Polytech Univ Bari, Dept Mech Math & Management, Via Orabona 4, I-70125 Bari, Italy
[3] Harper Adams Univ, Dept Engn, Edgmond, England
[4] Univ Tecn Federico Santa Maria, Adv Ctr Elect & Elect Engn AC3E, Dept Elect Engn, Valparaiso, Chile
关键词
Grape bunch detection; Instance segmentation; Zero-shot networks; Precision agriculture; Agriculture robotics; AGRICULTURE;
D O I
10.1016/j.compag.2024.109611
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Latest advances in artificial intelligence, particularly in object recognition and segmentation, provide unprecedented opportunities for precision agriculture. This work investigates the use of state-of-the-art AI models, namely Meta's Segment Anything (SAM) and GroundingDino, for the task of grape cluster detection in vineyards. Three different methods aimed at enhancing the instance segmentation process are proposed: (i) SAM-Refine (SAM-R), which refines a previously proposed depth-based clustering approach, referred to as DepthSeg, using SAM; (ii) SAM-Segmentation (SAM-S), which integrates SAM with a pre-trained semantic segmentation model to improve cluster separation; and (iii) AutoSAM-Dino (ASD), which eliminates the need for manual labeling and transfer learning through the combined use of GroundingDino and SAM. Analysis is conducted on both the object counting and pixel-level segmentation accuracy against a manually labeled ground truth. Metrics such as mean Average Precision (mAP), Intersection over Union (IoU), and precision and recall are calculated to assess the system performance. Compared to the original DepthSeg algorithm, SAM-R slightly advances object counting (mAP: +0.5%) and excels in pixel-level segmentation (IoU: +17.0%). SAM-S, despite a mAP decrease, improves segmentation accuracy (IoU: +13.9%, Precision: +9.2%, Recall: +11.7%). Similarly, ASD, although with a lower mAP, shows significant accuracy enhancement (IoU: +7.8%, Precision: +4.2%, Recall: +4.9%). Additionally, from a labor effort point of view, instance segmentation techniques require much less time for training than manual labeling.
引用
收藏
页数:16
相关论文
共 37 条
[1]  
[Anonymous], [1] http://aiweb.techfak.uni-bielefeld.de/content/bworld-robot-control-software/.
[2]  
Boateng Ernest Yeboah., 2020, J DATA ANAL INF PROC, V8, P341, DOI [DOI 10.4236/JDAIP.2020.84020, https://doi.org/10.4236/jdaip.2020.84020]
[3]   Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture [J].
Casado-Garcia, A. ;
Heras, J. ;
Milella, A. ;
Marani, R. .
PRECISION AGRICULTURE, 2022, 23 (06) :2001-2026
[4]  
Cheng YM, 2023, Arxiv, DOI arXiv:2305.06558
[5]   Weakly and semi-supervised detection, segmentation and tracking of table grapes with limited and noisy data [J].
Ciarfuglia, Thomas A. ;
Motoi, Ionut M. ;
Saraceni, Leonardo ;
Fawakherji, Mulham ;
Sanfeliu, Alberto ;
Nardi, Daniele .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 205
[6]  
Devanna R.P., 2023, Automated detection and counting of grape bunches using a farmer robot
[7]   In-Field Automatic Identification of Pomegranates Using a Farmer Robot [J].
Devanna, Rosa Pia ;
Milella, Annalisa ;
Marani, Roberto ;
Garofalo, Simone Pietro ;
Vivaldi, Gaetano Alessandro ;
Pascuzzi, Simone ;
Galati, Rocco ;
Reina, Giulio .
SENSORS, 2022, 22 (15)
[8]  
Eli-Chukwu NC, 2019, ENG TECHNOL APPL SCI, V9, P4377
[9]   Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends [J].
Gan, Zhe ;
Li, Linjie ;
Li, Chunyuan ;
Wang, Lijuan ;
Liu, Zicheng ;
Gao, Jianfeng .
FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4) :163-352
[10]  
Ghoury S., 2019, INT C ADV TECHN COMP, P39