Visual detection of moving stacked objects based on efficient multi-scale grouping and improved multi-head self-attention

被引:0
作者
Fei, Sheng-wei [1 ]
Zhang, Hao-jie [1 ]
机构
[1] Donghua Univ, Coll Mech Engn, Shanghai 201620, Peoples R China
关键词
deep learning; target detection; RT-DETR; attention mechanism; cascade packet;
D O I
10.1088/1361-6501/adb16e
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
To address the issue of fruit stacking and obstructing target fruits during daily fruit sorting, this paper presents a novel object detection algorithm that leverages efficient multi-scale grouping and enhanced multi-head self-attention. The proposed target detection algorithm is based on real-time detection transformer (RT-DETR) as the baseline to simplify optimization challenges and enhance robustness. Additionally, we introduce efficient multi-scale attention to preserve channel information, optimize the multi-head self-attention, and adopt cascade grouping to reduce computational redundancy. Furthermore, we use a new loss function (Inner-MPDIoU) combined with a bounding box similarity comparison metric (MPDIoU) and inner idea to enhance the accuracy of detecting moving occluded targets. Experimental results demonstrate that the optimized RT-DETR algorithm achieves an average accuracy of 96.3% in detecting moving stacked fruit models with a detection speed of up to 67 FPS. This confirms the effectiveness of our algorithm in matching and recognizing blocked fruit targets, surpassing common algorithms for recognizing obstructed targets.
引用
收藏
页数:12
相关论文
共 20 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   AFOD: Two-stage object detection based on anchor-free remote sensing photos [J].
Fu, Liangrui ;
Deng, Jinqiu ;
Zhu, Baoliang ;
Li, Zengyan ;
Liao, Xudong .
OPEN COMPUTER SCIENCE, 2024, 14 (01)
[3]   A sparse representation-based local occlusion recognition method for athlete expressions [J].
Huang, Shaowu .
INTERNATIONAL JOURNAL OF BIOMETRICS, 2024, 16 (3-4) :287-299
[4]   Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition [J].
Kumari, Diksha ;
Anand, Radhey Shyam .
APPLIED SCIENCES-BASEL, 2023, 13 (21)
[5]   Spectrum sensing and modulation recognition using a novel CNN Deep Learning model and Learning transfer technique [J].
Mahieddine, Mohamed Ben Mohammed ;
Bassou, Abdesselam ;
Chouakri, Sid Ahmed ;
Mellah, Nesrine ;
Khelifi, Mustapha .
PRZEGLAD ELEKTROTECHNICZNY, 2023, 99 (05) :93-97
[6]   Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J].
Ojala, T ;
Pietikäinen, M ;
Mäenpää, T .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :971-987
[7]   SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data [J].
Piran, Zoe ;
Nitzan, Mor .
NATURE COMMUNICATIONS, 2024, 15 (01)
[8]   Hybrid heuristic mechanism for occlusion aware facial expression recognition scheme using patch based adaptive CNN with attention mechanism [J].
Prasad, A. Reddy ;
Rajesh, A. .
INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2023, 17 (03) :773-797
[9]  
Tan MX, 2021, PR MACH LEARN RES, V139, P7102
[10]   Parallel Concatenated Block Codes Constructed by Convolutional Interleavers [J].
Vafi, Sina .
IEEE ACCESS, 2021, 9 :41218-41226