Visual detection of moving stacked objects based on efficient multi-scale grouping and improved multi-head self-attention

被引：0

作者：

Fei, Sheng-wei ^{[1
]}

Zhang, Hao-jie ^{[1
]}

机构：

[1] Donghua Univ, Coll Mech Engn, Shanghai 201620, Peoples R China

来源：

MEASUREMENT SCIENCE AND TECHNOLOGY | 2025年 / 36卷 / 03期

关键词：

deep learning; target detection; RT-DETR; attention mechanism; cascade packet;

D O I：

10.1088/1361-6501/adb16e

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

To address the issue of fruit stacking and obstructing target fruits during daily fruit sorting, this paper presents a novel object detection algorithm that leverages efficient multi-scale grouping and enhanced multi-head self-attention. The proposed target detection algorithm is based on real-time detection transformer (RT-DETR) as the baseline to simplify optimization challenges and enhance robustness. Additionally, we introduce efficient multi-scale attention to preserve channel information, optimize the multi-head self-attention, and adopt cascade grouping to reduce computational redundancy. Furthermore, we use a new loss function (Inner-MPDIoU) combined with a bounding box similarity comparison metric (MPDIoU) and inner idea to enhance the accuracy of detecting moving occluded targets. Experimental results demonstrate that the optimized RT-DETR algorithm achieves an average accuracy of 96.3% in detecting moving stacked fruit models with a detection speed of up to 67 FPS. This confirms the effectiveness of our algorithm in matching and recognizing blocked fruit targets, surpassing common algorithms for recognizing obstructed targets.

引用

页数：12

共 20 条

[1] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[2] AFOD: Two-stage object detection based on anchor-free remote sensing photos [J].

Fu, Liangrui ;

Deng, Jinqiu ;

Zhu, Baoliang ;

Li, Zengyan ;

Liao, Xudong .

OPEN COMPUTER SCIENCE, 2024, 14 (01)

[3] A sparse representation-based local occlusion recognition method for athlete expressions [J].

Huang, Shaowu .

INTERNATIONAL JOURNAL OF BIOMETRICS, 2024, 16 (3-4) :287-299

[4] Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition [J].

Kumari, Diksha ;

Anand, Radhey Shyam .

APPLIED SCIENCES-BASEL, 2023, 13 (21)

[5] Spectrum sensing and modulation recognition using a novel CNN Deep Learning model and Learning transfer technique [J].

Mahieddine, Mohamed Ben Mohammed ;

Bassou, Abdesselam ;

Chouakri, Sid Ahmed ;

Mellah, Nesrine ;

Khelifi, Mustapha .

PRZEGLAD ELEKTROTECHNICZNY, 2023, 99 (05) :93-97

[6] Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J].

Ojala, T ;

Pietikäinen, M ;

Mäenpää, T .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :971-987

[7] SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data [J].

Piran, Zoe ;

Nitzan, Mor .

NATURE COMMUNICATIONS, 2024, 15 (01)

[8] Hybrid heuristic mechanism for occlusion aware facial expression recognition scheme using patch based adaptive CNN with attention mechanism [J].

Prasad, A. Reddy ;

Rajesh, A. .

INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2023, 17 (03) :773-797

[9]

Tan MX, 2021, PR MACH LEARN RES, V139, P7102

[10] Parallel Concatenated Block Codes Constructed by Convolutional Interleavers [J].

Vafi, Sina .

IEEE ACCESS, 2021, 9 :41218-41226

← 1 2 →