Interactive Image Segmentation Based on Fusion of Two-Stage Feature and Transformer Encoder

被引：0

作者：

Feng, Jun ^{[1
]}

Zhang, Tian ^{[1
]}

Shi, Yichen ^{[1
]}

Wang, Hui ^{[1
]}

Hu, Jingjing ^{[2
]}

机构：

[1] School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang

[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing

来源：

Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 06期

关键词：

deep learning; interactive feature fusion; interactive image segmentation; lightweight network; Transformer encoder;

D O I：

10.3724/SP.J.1089.2024.19922

中图分类号：

学科分类号：

摘要：

In order to segment the foreground objects that users are interested in quickly and accurately, and obtain high-quality and low-cost annotation segmentation data, an interactive image segmentation algorithm based on two-stage feature fusion and Transformer encoder is proposed. Firstly, lightweight Transformer backbone network is adopted to extract multi-scale feature coding for input image, which can make better use of context information. Then, the subjective prior knowledge is introduced by means of click interaction, and the interactive features are integrated into Transformer network through the primary and enhanced stages in turn. Finally, the atrous convolution, attention mechanism and multi-layer perceptron are combined to decode the feature map obtained by the backbone network. Experimental results show that mNoC@90% values of the proposed algorithm on the GrabCut, Berkeley and DAVIS datasets reach 2.18, 4.04 and 7.39 respectively, which is better than other comparison algorithms. And the time and space complexity is lower than that of f-BRS-B. The proposed algorithm has good stability to the disturbance change of interactive click position and click type. It shows that the proposed algorithm can quickly, accurately and stably segment users’ interested objects, and improve user interaction experience. © 2024 Institute of Computing Technology. All rights reserved.

引用

页码：831 / 843

页数：12

共 29 条

[1]

Wang G T, Li W Q, Zuluaga M A, Et al., Interactive medical image segmentation using deep learning with image-specific fine tuning, IEEE Transactions on Medical Imaging, 37, 7, pp. 1562-1573, (2018)

[2]

Zhang Jianwei, Zhang Xubin, Xu Yuyang, Et al., Spatial prior-embedded neural networks for medical image segmentation, Journal of Computer-Aided Design & Computer Graphics, 33, 8, pp. 1287-1294, (2021)

[3]

Ramadan H, Lachqar C, Tairi H., A survey of recent interactive image segmentation methods, Computational Visual Media, 6, 4, pp. 355-384, (2020)

[4]

Boykov Y Y, Jolly M P., Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images, Proceedings of the 8th IEEE International Conference on Computer Vision, pp. 105-112, (2001)

[5]

Grady L., Random walks for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 11, pp. 1768-1783, (2006)

[6]

Ning J F, Zhang L, Zhang D, Et al., Interactive image segmentation by maximal similarity based region merging, Pattern Recognition, 43, 2, pp. 445-456, (2010)

[7]

Shelhamer E, Long J, Darrell T., Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 4, pp. 640-651, (2017)

[8]

Chen L C, Papandreou G, Kokkinos I, Et al., Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2018)

[9]

Ronneberger O, Fischer P, Brox T., U-Net: convolutional networks for biomedical image segmentation, Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, (2015)

[10]

Zhao H S, Shi J P, Qi X J, Et al., Pyramid scene parsing network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881-2890, (2017)

← 1 2 3 →