Real-time semantic segmentation method for field grapes based on channel feature pyramid

被引：0

作者：

Sun J. ^{[1
]}

Gong D. ^{[1
]}

Yao K. ^{[1
]}

Lu B. ^{[1
]}

Dai C. ^{[1
]}

Wu X. ^{[1
]}

机构：

[1] School of Electrical and Information Engineering, Jiangsu University, Zhenjiang

来源：

Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2022年 / 38卷 / 17期

关键词：

CFP; grape; image recognition; machine vision; real-time; semantic segmentation;

D O I：

10.11975/j.issn.1002-6819.2022.17.016

中图分类号：

学科分类号：

摘要：

Automated and intelligent harvesting has been one of the most important steps for urgent task in the grape industry. However, the current models of fruit recognition have posed a great balance between accuracy and real-time performance. In this study, a lightweight and real-time semantic segmentation model was proposed for field grape harvesting using a channel feature pyramid. Firstly, a publicly available dataset of field grape instance segmentation was used as the experimental object. A total of 300 grape images were collected with the different pruning periods, lighting conditions, and maturity levels. The LabelMe annotation tool was used to build the field grape dataset. Four types of objects were annotated, including the background, leaves, grapes, and stems. The dataset was then expanded using random enhancement, resulting in a total of 1200 images. Since the original images were too large in pixels to be trained directly, the image resolution was uniformly compressed to 512×512 (pixels) for better training efficiency of the network model. Secondly, the convolutional kernels of different sizes were arranged in the perceptual fields, due to the huge differences in the grape size and location. The channel feature pyramid module was then utilized for the feature extraction. The 3×3, 5×5, and 7×7 multi-scale feature extraction datasets were then achieved for the jumping connections of 1×3 and 3×1 null convolutions in a single channel. As such, the multi-scale and contextual features were effectively extracted from the grape images. At the same time, the model parameters were reduced to increase the trainable ones for less information loss. The convolutional fusion structure was pooled during down-sampling, instead of the traditional maximum pooling structure. The jump joints were employed in the decoding part, in order to fuse information from different feature layers for the recovery of image details. Finally, the improved model was tested on a grape test set. The experimental results showed that the Mean Intersection over Union (MIoU) was 78.8%, The Mean Pixel Accuracy (MPA) was 90.3%, and the real-time processing speed was 68.56 frames/s. The model size was only 4.88 MB. The accuracies of Mean IoU were improved by 7.9, 5.7, and 10.5 percentage points in the real-time semantic segmentation networks, respectively, compared with the BiSeNet, ENet, and DFAnet. The accuracies of the improved model increased by 1.2 and 8.8 percentage points, respectively, compared with lightweight networks using mobilienetv3 and inception as encoders. Therefore, the proposed network presented a significant advantage over the real-time and lightweight networks, in terms of segmentation accuracy. The mean IoUs of the semantic segmentation network was reduced by 2.3, 2.0, and 3.7 percentage points, respectively, but the model sizes were 12.3%, 4.1%, and 7.4%, respectively, compared with the classical networks, Deeplabv3+, SegNet, and UNet. The real-time requirement fully met the tradeoff between real-time and accuracy. The improved model can be expected to serve as the segmentation recognition of field grapes in smart agriculture. The finding can also provide technical support for the visual recognition systems in the grape-picking robots. © 2022 Chinese Society of Agricultural Engineering. All rights reserved.

引用

页码：150 / 157

页数：7

共 26 条

[1] 7, pp. 3-6, (2002)
[2] Liu Ping, Zhu Yanjun, Zhang Tongxun, Et al., Algorithm for recognition and image segmentation of overlapping grape cluster in natural environment, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 36, 6, pp. 161-169, (2020)
[3] Tian Rui, Guo Yanling, Automatic grape recognition technology based on machine vision, Journal of Northeast Forestry University, 36, 11, pp. 95-97, (2008)
[4] Rodrigo P, Miguel T, Fernando A C, Et al., A pattern recognition strategy for visual grape bunch detection in vineyards, Computers and Electronics in Agriculture, 151, pp. 136-149, (2018)
[5] Chang L, Yu C, Yan L, Et al., Deep learning-based food image recognition for computer-aided dietary assessment, Wuhan: International Conference on Inclusive Smart Cities and Digital Health, (2016)
[6] Sun Jun, Tan Wenjun, Mao Hanping, el at, Recognition of multiple plant leaf diseases based on improved convolutional neural network, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 33, 19, pp. 209-215, (2017)
[7] Zhao De'an, Wu Rendi, Liu Xiaoyang, Et al., Robot picking apple location under complex background based on Yolo deep convolution neural network, Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE), 35, 3, pp. 164-173, (2019)
[8] Li Jiuhao, Lin Lejian, Tian Kai, Et al., Detection of balsam pear leaf diseases in the field by improved Faster R-CNN, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 36, 12, pp. 179-185, (2020)
[9] Sun Jun, Tan Wenjun, Wu Xiaohong, Et al., Real time recognition of sugarbeet and weeds under complex background by multi-channel depth separable convolution model, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 12, pp. 184-190, (2019)
[10] Li Yunwu, Xu Junjie, Liu Dexiong, Et al., Field road scene recognition in hilly regions based on improved dilated convolutional networks, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 7, pp. 150-159, (2019)

← 1 2 3 →