Depth estimation of apple tree in single image using improved HRNet

被引：0

作者：

Long Y. ^{[1
]}

Gao Y. ^{[1
]}

Zhang G. ^{[1
]}

机构：

[1] College of Mechanical and Electronic Engineering, Northwest A & F University

[2] 2. Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs

[3] 3. Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling

来源：

Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2022年 / 38卷 / 23期

关键词：

apple tree; convolutional block attention module; deep learning; dense connection mechanism; image processing; single image depth estimation; stripe refinement module;

D O I：

10.11975/j.issn.1002-6819.2022.23.013

中图分类号：

学科分类号：

摘要：

An accurate and rapid estimation of apple tree depth can be widely applied to the precise fruit positioning and robot autonomous harvesting in recent years. In this study, an improved High-Resolution Network (HRNet) was proposed to estimate the monocular depth of apple tree in the real scene. The actual requirements of the depth were obtained from a single RGB image for the apple mechanized picking. Firstly, a multi-branch parallel encoder network was constructed to extract the multi-scale features using the HRNet. A dense connection mechanism was introduced to enhance the continuity in the feature transfer process. Secondly, the Convolutional Block Attention Module (CBAM) was used to recalibrate the fused feature maps at the channel and pixel levels, in order to reduce the noise interference that caused by redundant features. Furthermore, the different weight distributions of the feature maps were effectively learned to enhance the structure information. In the decoder network, the Stripe Refinement Module (SRM) was used to gather the boundary pixels in the horizontal and vertical orthogonal directions. The boundary details of the feature map were adaptively optimized to highlight the edge features. As such, the blurry edge was reduced in the predicted images. Finally, the up-sampling was utilized to generate the prediction depth images of the same size as the RGB images. An image acquisition platform was constructed to collect the RGB and depth images of apple orchards at different times. The data was then enhanced using horizontal mirroring, color jitter, and random rotation. After data enhancement, the 3374 orchard RGB and depth images were obtained for the depth datasets. A series of experiments were also conducted on the NYU Depth V2 dataset and the orchard depth dataset. Ablation experiments were firstly performed on the HRNet networks with different degrees of improvement. The predictive performance of different improved networks was improved significantly, compared with the traditional HRNet network. It indicated that the dense connection mechanism, CBAM, and SRM were added to improve the model performance. Secondly, the mean relative error (MRE), root mean square error (RMS), logarithmic mean error, depth edge accuracy error, and edge integrity error of the improved HRNet network on the orchard depth dataset were 0.123, 0.547, 0.051, 3.90 and 10.59, respectively, compared with the current mainstream networks. The accuracy reached 0.850, 0.975 and 0.993 at different thresholds, respectively. More accurate spatial resolution was achieved in the depth map that generated by the improved HRNet network, in terms of subjective vision. The improved network can be expected to better present the depth information distribution of the image, particularly with the clear edges and more texture details. More importantly, the depth information of some small-sized objects was also displayed, indicating the best overall effect closer to the real depth map. The ablation analysis demonstrated the higher effectiveness of depth estimation using the improved network, compared with the subjective and objective ones. The experiment also verified that the proposed network was outperformed for both visual quality and objective measurement on the NYU Depth V2 and the orchard depth dataset. The finding can provide a new idea to obtain depth information in the apple automatic picking machine. © 2022 Chinese Society of Agricultural Engineering. All rights reserved.

引用

页码：122 / 129

页数：7

共 26 条

[1] Wang Dandan, Song Huaibo, He Dongjian, Research advance on vision system of apple picking robot, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 33, 10, pp. 59-69, (2017)
[2] Zhou Yuncheng, Deng Hanbing, Xu Tongyu, Et al., Unsupervised deep estimation modeling for tomato plant image based on dense convolutional auto-encoder, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 36, 11, pp. 182-192, (2020)
[3] Zhang Qin, Chen Jianmin, Li Bin, Et al., Method for recognizing and locating tomato cluster picking points based on RGB-D information fusion and target detection, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 37, 18, pp. 143-152, (2021)
[4] Guo Jifeng, Bai Chengchao, Guo Shuang, A review of monocular depth estimation based on deep learning, Unmanned Systems Technology, 2, 2, pp. 12-21, (2019)
[5] Li Yang, Chen Xiuwan, Wang Yuan, Et al., Progress in deep learning based monocular image depth estimation, Laser & Optoelectronics Progress, 56, 19, pp. 9-25, (2019)
[6] Zhao C, Sun Q, Zhang C, Et al., Monocular depth estimation based on deep learning: an review, Science China Technological Sciences, 63, 9, pp. 1612-1627, (2020)
[7] Huang Jun, Wang Cong, Liu Yue, Et al., The progress of monocular depth estimation technology, Journal of Image and Graphics, 24, 12, pp. 2081-2097, (2019)
[8] Song Wei, Zhu Mengfei, Zhang Minghua, Et al., A review of monocular depth estimation techniques based on deep learning, Journal of Image and Graphics, 27, 2, pp. 292-328, (2022)
[9] Faisal K, Saqib S, Hossein J., Deep learning-based monocular depth estimation methods: A state-of-the-art review, Sensors, 20, 8, pp. 2272-2272, (2020)
[10] Masoumian A, Rashwan H A, Cristiano J, Et al., Monocular depth estimation using deep learning: A review, Sensors, 22, 14, pp. 5353-5377, (2022)

← 1 2 3 →