A scalable multi-modal learning fruit detection algorithm for dynamic environments

被引:0
作者
Mao, Liang [1 ,2 ]
Guo, Zihao [1 ]
Liu, Mingzhe [2 ]
Li, Yue [2 ]
Wang, Linlin [1 ]
Li, Jie [1 ]
机构
[1] Shenzhen Polytech Univ, Guangdong Hong Kong Macao Greater Bay Area Artific, Hong Kong, Guangdong, Peoples R China
[2] Univ Sci & Technol Liaoning, Sch Comp Sci & Software Engn, Anshan, Peoples R China
来源
FRONTIERS IN NEUROROBOTICS | 2025年 / 18卷
关键词
multi-modal learning; machine learning; fruit recognition; deep learning; objective detection; REPRESENTATION; FUSION;
D O I
10.3389/fnbot.2024.1518878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Introduction To enhance the detection of litchi fruits in natural scenes, address challenges such as dense occlusion and small target identification, this paper proposes a novel multimodal target detection method, denoted as YOLOv5-Litchi.Methods Initially, the Neck layer network of YOLOv5s is simplified by changing its FPN+PAN structure to an FPN structure and increasing the number of detection heads from 3 to 5. Additionally, the detection heads with resolutions of 80 x 80 pixels and 160 x 160 pixels are replaced by TSCD detection heads to enhance the model's ability to detect small targets. Subsequently, the positioning loss function is replaced with the EIoU loss function, and the confidence loss is substituted by VFLoss to further improve the accuracy of the detection bounding box and reduce the missed detection rate in occluded targets. A sliding slice method is then employed to predict image targets, thereby reducing the miss rate of small targets.Results Experimental results demonstrate that the proposed model improves accuracy, recall, and mean average precision (mAP) by 9.5, 0.9, and 12.3 percentage points, respectively, compared to the original YOLOv5s model. When benchmarked against other models such as YOLOx, YOLOv6, and YOLOv8, the proposed model's AP value increases by 4.0, 6.3, and 3.7 percentage points, respectively.Discussion The improved network exhibits distinct improvements, primarily focusing on enhancing the recall rate and AP value, thereby reducing the missed detection rate which exhibiting a reduced number of missed targets and a more accurate prediction frame, indicating its suitability for litchi fruit detection. Therefore, this method significantly enhances the detection accuracy of mature litchi fruits and effectively addresses the challenges of dense occlusion and small target detection, providing crucial technical support for subsequent litchi yield estimation.
引用
收藏
页数:16
相关论文
共 41 条
  • [1] Multimodal Machine Learning for Pedestrian Detection
    Aledhari, Mohammed
    Razzak, Rehma
    Parizi, Reza M.
    Srivastava, Gautam
    [J]. 2021 IEEE 93RD VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-SPRING), 2021,
  • [2] A review of deep learning techniques used in agriculture
    Attri, Ishana
    Awasthi, Lalit Kumar
    Sharma, Teek Parval
    Rathee, Priyanka
    [J]. ECOLOGICAL INFORMATICS, 2023, 77
  • [3] A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions
    Barua, Arnab
    Ahmed, Mobyen Uddin
    Begum, Shahina
    [J]. IEEE ACCESS, 2023, 11 : 14804 - 14831
  • [4] Evolutionary Multiobjective Optimization-Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection
    Cheng, Ran
    Li, Miqing
    Li, Ke
    Yao, Xin
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2018, 22 (05) : 692 - 706
  • [5] An Improved Image Classification Method for Cervical Precancerous Lesions Based on ShuffleNet
    Fang, Shan
    Yang, Jiahui
    Wang, Minghui
    Liu, Chunhui
    Liu, Shuang
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [6] Fang W, 2021, ADV NEUR IN, V34
  • [7] Deep Multimodal Representation Learning: A Survey
    Guo, Wenzhong
    Wang, Jianwen
    Wang, Shiping
    [J]. IEEE ACCESS, 2019, 7 : 63373 - 63394
  • [8] Graph Fusion Network-Based Multimodal Learning for Freezing of Gait Detection
    Hu, Kun
    Wang, Zhiyong
    Martens, Kaylena A. Ehgoetz
    Hagenbuchner, Markus
    Bennamoun, Mohammed
    Tsoi, Ah Chung
    Lewis, Simon J. G.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1588 - 1600
  • [9] Fusing Multimodal Video Data for Detecting Moving Objects/Targets in Challenging Indoor and Outdoor Scenes
    Kandylakis, Zacharias
    Vasili, Konstantinos
    Karantzalos, Konstantinos
    [J]. REMOTE SENSING, 2019, 11 (04)
  • [10] STDP-based spiking deep convolutional neural networks for object recognition
    Kheradpisheh, Saeed Reza
    Ganjtabesh, Mohammad
    Thorpe, Simon J.
    Masquelier, Timothee
    [J]. NEURAL NETWORKS, 2018, 99 : 56 - 67