Semantic segmentation of terrace image regions based on lightweight CNN-Transformer hybrid networks

被引：0

作者：

Liu X. ^{[1
]}

Yi S. ^{[1
]}

Li L. ^{[1
]}

Cheng X. ^{[1
]}

Wang C. ^{[1
]}

机构：

[1] School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu

来源：

Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering | 2023年 / 39卷 / 13期

关键词：

axial attention; image process; lightweight model; semantic segmentation; terraced dataset;

D O I：

10.11975/j.issn.1002-6819.202304025

中图分类号：

学科分类号：

摘要：

Terracing has been widely used in conventional cultivation modes to stabilize crop production, as well as soil and water conservation. The construction of terraces can be one of the most important measures to develop agricultural production. However, some terraces often face the risk of being destroyed, due to the influence of construction quality during management and maintenance. Therefore, it is a high demand to quickly and accurately detect the distribution of terraced areas under high food production, soil erosion control, and planning regional ecology. Alternatively, unmanned aerial vehicle (UAV) aerial camera system has been widely used to obtain high-resolution remote sensing images in the field of intelligent agriculture. Among them, semantic segmentation has promoted the development of several fields using deep learning, particularly with the rapid development of information technology. Inspired by MobileVit, an axial attention mechanism (axial attention) was introduced in the MobileViT block. In this study, an encoder-decoder structure was proposed for a lightweight CNN-Transformer hybrid architecture-based network model. The encoder part of the model consisted of an improved MobileViT block. An inverse residual module was first incorporated into the strip pooling and a void space pyramidal pooling module. And then the local and global visual representation information interaction was achieved to effectively design the placement order of each module, in order to obtain a complete global feature representation. Strip pooling was introduced to effectively capture the remote dependencies. The high-level semantic information was then efficiently extracted from a large amount of data. The bar pools were introduced to effectively capture the remote dependencies, in order to extract the high-level semantic feature maps from a large amount of semantic information. The introduction of the void space pyramid pooling module was to capture contextual information from multiple scales. The perceptual field of the model was improved to obtain a denser semantic feature map. PSPNet, LiteSeg, BisNetv2, Deeplabv3Plush, and MobileViT were selected for comparison experiments on the same test set. The results show that the improved model performed the best, in terms of accuracy and speed. More importantly, better performance of achieved in the more accurate recognition and region delineation of complex and irregular UAV image terraces. Specifically, the pixel accuracy of the lightweight CNN-Transformer hybrid architecture network model was 95.79%, the average pixel accuracy was 87.82%, the average intersection ratio was 80.91%, and the frequency power intersection ratio was 94.86%. Furthermore, the improved model was only 8.32 M parameters with a small size, and low computational complexity, as well as a frame rate of 51.91 frames per second, indicating the real-time and lightweight model. A comprehensive analysis was also made of the performance indexes of each segmentation model. It was found that the segmentation accuracy was higher and faster using the lightweight CNN-Transformer hybrid architecture network model with a small model size and low computational complexity. Therefore, the improved model can be expected to deploy on the UAVs, fully meet the requirements of lightweight, high accuracy, and low latency for mobile vision tasks. The semantic segmentation of the terrace area was used to further obtain the information of shape, location, and outline of terraces. A timely and accurate detection was also achieved in the information of terrace edge for the prevention and reinforcement of terraces. At the same time, the statistics of cultivation area and scope of terrace area can be expected to promote the development of terraces and dry farming area agriculture construction. © 2023 Chinese Society of Agricultural Engineering. All rights reserved.

引用

页码：171 / 181

页数：10

共 35 条

[11]

YANG Yanan, ZHANG Hongming, LI Hanghao, Et al., Research on UAV terraces identification method combining FCN and DenseCRF model, Computer Engineering and Applications, 3, pp. 222-230, (2021)

[12]

DENG Hong, YANG Yingting, LIU Zhaopeng, Et al., A semantic segmentation method for UAV water field images based on deep learning, Chinese Journal of Agricultural Chemistry, 42, 10, pp. 165-172, (2021)

[13]

LI Yunwu, XU Junjie, LIU Dexiong, Et al., Field road scene recognition in hilly regions based on improved dilated convolutional networks, Transactions of the Chinese Society of Agricultural Engineering, 35, 7, pp. 150-159, (2019)

[14]

YI Shi, LI Junjie, JIA Yong, Real-time semantic segmentation of farmland at night using infrared thermal imaging, Transactions of the Chinese Society of Agricultural Engineering, 36, 18, pp. 174-180, (2020)

[15]

YANG Lili, CHEN Yan, TIAN Weize, Et al., Field road segmentation method based on improved UNet, Transactions of the Chinese Society of Agricultural Engineering(Transactions of CSAE), 37, 9, pp. 185-191, (2021)

[16]

ZHANG X, YANG Y, LI Z, Et al., An improved encoder-decoder network based on strip pool method applied to segmentation of farmland vacancy field, Entropy, 23, 4, (2021)

[17]

LIU Shangwang, ZHANG Yangyang, CAI Tongbo, Et al., Semantic segmentation of UAV farming scenes based on improved PSPnet, Journal of Irrigation and Drainage, 41, 4, pp. 101-108, (2022)

[18]

HE J J, DENG Z Y, ZHOU L, Et al., Adaptive pyramid context network for semantic segmentation[C], Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7519-7528, (2019)

[19]

HOU Q, ZHANG L, CHENG M M, Et al., Strip Pooling: Rethinking spatial pooling for scene parsing, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4002-4011, (2020)

[20]

HO J, KALCHBRENNER N, WEISSENBORN D, Et al., Axial attention in multidimensional transformers, (2019)

← 1 2 3 4 →