Lightweight object detection method for Lingwu long jujube images based on improved SSD

被引:0
作者
Wang Y. [1 ]
Xue J. [1 ]
机构
[1] School of Mechanical Engineering, Ningxia University, Yinchuan
来源
Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2021年 / 37卷 / 19期
关键词
DenseNet; Images processing; Inception module; Lingwu long jujubes; Object detection; Pre-train model; SSD model;
D O I
10.11975/j.issn.1002-6819.2021.19.020
中图分类号
学科分类号
摘要
The complex working environment of picking robots has limited the picking speed and equipment memory resources in the intelligent harvesting of Lingwu long jujubes. Therefore, it is necessary to meet the requirements of lighter network structure and higher detection accuracy, particularly for the visual recognition system. A pre-train model has widely been loaded almost all the object detection at present, due to high initialization performance and convergence speed. However, two challenges are still remained: 1) The network structure cannot be changed on the limited memory resources of the device; 2) There may be great differences between the ImageNet dataset and the dataset to be trained, leading to the low training effect. Taking the SSD model as the basic framework, this research aims to propose a lightweight object detection for the images of Lingwu long jujubes. The excellent performance was achieved without loading the pre-train model. Firstly, data augmentation is performed on the collected 1 000 images to obtain 5 000 images. Data augmentation operations include random cropping, random vertical or horizontal flipping, random brightness adjustment, random contrast adjustment, and random saturation adjustment. Secondly, the Lingwu long jujube dataset was established, including 3 500 training images and 1 500 test images. The resolution of images consisted of 3 016×4 032, 4 068×3 456, and 2 448×3 264. The models of smartphones for image acquisition included HUAWEI TRT-AL00A, Vivo Y79A, and Xiaomi 2014501. The images were uniformly scaled to the resolution of 300×300, in order to meet the input requirements of image size in the SSD object detection. Data augmentation included random cropping, random vertical or horizontal flipping, as well as random adjustment of brightness, contrast, and saturation. The format of the PASCAL VOC dataset was also adopted. Labelling software was used to label the images, and then the marked images were stored in the label folder in XML format. Secondly, the improved DenseNet was utilized the Convolutional Block Attention Modules and two dense blocks with convolution groups of 6 and 8. Taking the improved DenseNet as the backbone network, the improved SSD model was obtained to combine with the multi-level fusion structure, where the first three additional layers were replaced in the SSD model with the Inception module. In the improved SSD model without loading the pre-train model, the mAP was 96.60%, the detection speed was 28.05 frames/s, and the number of parameters was 1.99×106, particularly 2.02 percentage points and 0.05 percentage points higher than that of the SSD and SSD model (pre-train), respectively. Correspondingly, the parameter of the improved SSD model was 11.14×106 lower than the SSD model, fully meeting the requirements of the lightweight network without loading the pre-train model. This finding can provide a strong visual technical support for the intelligent harvesting of Lingwu long jujubes, even medical and multispectral images detection tasks. © 2021, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.
引用
收藏
页码:173 / 182
页数:9
相关论文
共 35 条
[1]  
(2019)
[2]  
(2018)
[3]  
28, pp. 36-37, (2020)
[4]  
4, pp. 30-32, (2019)
[5]  
41, pp. 162-166, (2021)
[6]  
Yang Shuqin, Liu Yangqihang, Wang Zhen, Et al., Improved YOLO V4 model for face recognition of diary cow by fusing coordinate information, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 37, 15, pp. 129-135, (2021)
[7]  
Cai Shuping, Sun Zhongming, Liu Hui, Et al., Real-time detection methodology for obstacles in orchards using improved YOLOv4, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 37, 2, pp. 36-43, (2021)
[8]  
Yan Hongwen, Liu Zhenyu, Cui Qingliang, Et al., Multi-target detection based on feature pyramid attention and deep convolution network for pigs, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 36, 11, pp. 193-202, (2020)
[9]  
Yi Shi, Li Xinrong, Wu Zhijuan, Et al., Night hare detection method based on infrared thermal imaging and improved YOLOV3, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 19, pp. 223-229, (2019)
[10]  
Hu Zhiwei, Yang Hua, Lou Tiantian, Instance detection of group breeding pigs using a pyramid network with dual attention feature, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 37, 5, pp. 166-174, (2021)