Method for estimating the image depth of tomato plant based on self-supervised learning

被引:0
作者
Zhou Y. [1 ]
Xu T. [1 ]
Deng H. [1 ]
Miao T. [1 ]
Wu Q. [1 ]
机构
[1] College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang
来源
Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2019年 / 35卷 / 24期
关键词
Algorithms; Convolution neural network; Deep learning; Depth estimation; Disparity; Image processing; Self-supervised learning; Tomato;
D O I
10.11975/j.issn.1002-6819.2019.24.021
中图分类号
学科分类号
摘要
Depth estimation is critical to 3D reconstruction and object location in intelligent agricultural machinery vision system, and a common method in it is stereo matching. Traditional stereo matching method used low-quality image extracted manually. Because the color and texture in the image of field plant is nonuniform, the artificial features in the image are poorly distinguishable and mismatching could occur as a result. This would compromise the accuracy of the depth of the map. While the supervised learning-based convolution neural network (CNN) is able to estimate the depth of each pixel in plant image directly, it is expensive to annotate the depth data. In this paper, we present a depth estimation model based on the self-supervised learning to phenotype tomato canopy. The tasks of the depth estimation method were to reconstruct the image. The dense disparity maps were estimated indirectly using the rectified stereo pair of images as the network input, from which a bilinear interpolation was used to sample the input images to reconstruct the warping images. We developed three channel wise group convolutional (CWGC) modules, including the dimension invariable convolution module, the down-sampling convolution module and the up-sampling convolution module, and used them to construct the convolutional auto-encoder - a key infrastructure in the depth estimation method. Considering the shortage of manual features for comparing image similarity, we used the loss in image convolutional feature similarity as one objective of the network training. A CWGC-based CNN classification network (CWGCNet) was developed to extract the low-level features automatically. In addition to the loss in image convolutional feature similarity, we also considered the whole training loss, which include the image appearance matching loss, disparity smoothness loss and left-right disparity consistency loss. A stereo pair of images of tomato was sampled using a binocular camera in a greenhouse. After epipolar rectification, the pair of images was constructed for training and testing of the depth estimation model. Using the Microsoft Cognitive Toolkit (CNTK), the CWGCNet and the depth estimation network of the tomato images were calculated using Python. Both training and testing experiments were conducted in a computer with a Tesla K40c GPU (graphics processing unit). The results showed that the shallow convolutional layer of the CWGCNet successfully extracted the low-level multiformity image features to calculate the loss in image convolutional feature similarity. The convolutional auto-encoder developed in this paper was able to significantly improve the disparity map estimated by the depth estimation model. The loss function in image convolutional feature similarity had a remarkable effect on accuracy of the image depth. The accuracy of the disparity map estimated by the model increased with the number of convolution modules for calculating the loss in convolutional feature similarity. When sampled within 9.0 m, the root means square error (RMSE) and the mean absolute error (MAE) of the corner distance estimated by the model were less than 2.5 cm and 1.8 cm, respectively, while when sampled within 3.0m, the associated errors were less than 0.7cm and 0.5cm, respectively. The coefficient of determination (R2) of the proposed model was 0.8081, and the test speed was 28 fps (frames per second). Compared with the existing models, the proposed model reduced the RMSE and MAE by 33.1% and 35.6% respectively, while increased calculation speed by 52.2%. © 2019, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.
引用
收藏
页码:173 / 182
页数:9
相关论文
共 34 条
[1]  
Xiang R., Ying Y., Jiang H., Development of real-time recognition and localization methods for fruits and vegetables in field, Transactions of the Chinese Society for Agricultural Machinery, 44, 11, pp. 208-223, (2013)
[2]  
Xiao K., Gao G., Ma Y., Pesticide spraying route planning algorithm for grapery based on Kinect video technique, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 33, 24, pp. 192-199, (2017)
[3]  
He Y., Jiang H., Fang H., Et al., Research progress of intelligent obstacle detection methods of vehicles and their application on agriculture, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 34, 9, pp. 21-32, (2018)
[4]  
Mo Y., Zou X., Ye M., Et al., Hand-eye calibration method based on Sylvester equation deformation for lychee harvesting robot, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 33, 4, pp. 47-54, (2017)
[5]  
Zhai C., Zhao C., Ning W., Et al., Research progress on precision control methods of air-assisted spraying in orchards, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 34, 10, pp. 1-15, (2018)
[6]  
Zhai Z., Du Y., Zhu Z., Et al., Three-dimensional reconstruction method of farmland scene based on Rank transformation, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 31, 20, pp. 157-164, (2015)
[7]  
Zhu R., Zhu Y., Wang L., Et al., Cotton positioning technique based on binocular vision with implementation of scale-invariant feature transform algorithm, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 32, 6, pp. 182-188, (2016)
[8]  
Hammerle M., Hofle B., Effects of reduced terrestrial LiDAR point density on high-resolution grain crop surface models in precision agriculture, Sensors, 14, 12, pp. 24212-24230, (2014)
[9]  
Cheng M., Cai Z., Ning W., Et al., System design for peanut canopy height information acquisition based on LiDAR, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 35, 1, pp. 180-187, (2019)
[10]  
Ren S., He K., Girshick R., Et al., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 6, pp. 1137-1149, (2017)