Unsupervised deep estimation modeling for tomato plant image based on dense convolutional auto-encoder

被引：0

作者：

Zhou Y. ^{[1
]}

Deng H. ^{[1
]}

Xu T. ^{[1
]}

Miao T. ^{[1
]}

Wu Q. ^{[1
]}

机构：

[1] College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang

来源：

Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2020年 / 36卷 / 11期

关键词：

Algorithms; Auto-encoder; Convolution neural network; Deep learning; Depth estimation; Disparity; Image processing; Tomato; Unsupervised learning;

D O I：

10.11975/j.issn.1002-6819.2020.11.021

中图分类号：

学科分类号：

摘要：

Depth information acquisition is the key to mobile robots which realize autonomous operation in the greenhouse. This study proposed an unsupervised model that used binocular images for training and testing based on dense convolutional auto-encoder. This model enabled the neural network to perform plant image depth estimation and defined a loss function for the depth estimation with convolution feature comparison and regularization constraints. Aiming at the problem of pixel vanished due to the different perspective and occlusion, a disparity confidence prediction was introduced to suppress the problem gradient caused by the image reconstruction loss. In the meantime, a dense block was designed based on separable convolution and built a convolutional auto-encoder as the backbone network for the model. In the greenhouse of tomato planting, a large number of binocular images were collected when tomato planting was growing on an overcast, cloudy and sunny days. An unsupervised plant image depth estimation network was also designed with a Python application interface, which was implemented by adopting Microsoft Cognitive Tools (CNTK) v2.7, a deep learning computing framework. The experiments of training and testing which were used image feature similarity, depth estimation error, and threshold precision as the criteria were carried out, and the binocular images of tomato planting were also taken as examples, on Tesla K40c graphic device. The results showed that the auto-encoder based on the separable convolution dense block which was compared with the regular convolution could effectively reduce the number of network weight parameters. Compared with the other activations which included ReLU (Rectified Linear Unit), Param-ReLU, ELU (Exponential Linear Unit), and SELU (Scaled-ELU), the network model with Leaky-ReLU as the nonlinear transformation had the minimum depth error and the maximum threshold precision. Also, the results showed that the network structures had significant impacts on the accuracy of prediction disparity. The introduction of separable convolution dense block in the skip connection between the encoder and decoder of auto-encoder had a certain effect on improving the accuracy of depth estimation. Meanwhile, by making the depth estimation model predict the disparity confidence which was used to restrain the problem gradient backpropagation, the error of depth estimation was remarkably decreased, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) were reduced by 55.2% and 33.0% respectively. The accuracy of depth estimation was significantly improved by using these processing methods, such as image reconstruction, loss function calculation after up-sampling the disparity map to the input image scale and splicing the multi-scale disparity map predicted by the network to the feature map of its encoder, as well as sending the combination feature map to the next prediction module. The performance of depth estimation was improved by increasing the depth and width of the convolutional auto-encoder. The error of depth estimation decreased significantly with the reduction of the spatial point depth. When the spatial point depth was within 9 m, the MAE of the estimated depth was less than 14.1 cm. And when the depth was within 3 m, the MAE was less than 7 cm. The influence of illumination conditions on the accuracy of this study depth estimation model was not significant. The method in this study was robust to the change of the luminous environment. The highest test speed of this study model was 14.2 FPS (Frames Per Second), which was near real-time. Compared with the existing researches, the mean relative error, MAE, and Mean Range Error (MRE) of depth estimation in this study were reduced by 46.0%, 26.0%, and 25.5% respectively. This research could provide a reference for the design of the vision system of greenhouse mobile robots. © 2020, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.

引用

页码：182 / 192

页数：10

共 35 条

[1] Ji Changying, Shen Ziyao, Gu Baoxing, Et al., Obstacle detection based on point clouds in application of agricultural navigation, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 31, 7, pp. 173-179, (2015)
[2] Xiao Ke, Gao Guandong, Ma Yuejin, Pesticide spraying route planning algorithm for grapery based on Kinect video technique, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 33, 24, pp. 192-199, (2017)
[3] Liang Xifeng, Peng Ming, Lu Jie, Et al., Servo control method of picking manipulator based on adaptive traceless Kalman filter, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 19, pp. 230-237, (2019)
[4] Ma Chi, Li Guanglin, Li Xiaodong, Et al., Development of multi-orientation automatic spraying device for citrus orchards in hilly and mountainous areas, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 3, pp. 31-41, (2019)
[5] Masuzawa H, Miura J, Oishi S., Development of a mobile robot for harvest support in greenhouse horticulture - Person following and mapping, IEEE/SICE International Symposium on System Integration (SII), pp. 541-546, (2017)
[6] Sun Guoxiang, Wang Xiaochan, Liu Jingna, Et al., Multi-modal three-dimensional reconstruction of greenhouse tomato plants based on phase-correlation method, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 18, pp. 134-142, (2019)
[7] Zhan Huangying, Garg R, Weerasekera C S, Et al., Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 340-349, (2018)
[8] Zhai Zhiqiang, Zhu Zhongxiang, Du Yuefeng, Et al., Method for detecting crop rows based on binocular vision with Census transformation, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 32, 11, pp. 205-213, (2016)
[9] Eigen D, Puhrsch C, Fergus R., Depth map prediction from a single image using a multi-scale deep network, Conference on Neural Information Processing Systems (NIPS 2014), (2014)
[10] Cheng Xinjing, Wang Peng, Yang Ruigang, Depth estimation via affinity learned with convolutional spatial propagation network, European Conference on Computer Vision (ECCV), (2018)

← 1 2 3 4 →