Few-Shot Semantic Segmentation with Cyclic Memory Network

被引：46

作者：

Xie, Guo-Sen ^{[1
,3
]}

Xiong, Huan ^{[1
,4
]}

Liu, Jie ^{[1
]}

Yao, Yazhou ^{[3
]}

Shao, Ling ^{[2
]}

机构：

[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates

[2] Incept Inst AI, Abu Dhabi, U Arab Emirates

[3] Nanjing Univ Sci & Technol, Nanjing, Jiangsu, Peoples R China

[4] Harbin Inst Technol, Harbin, Heilongjiang, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.00720

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot semantic segmentation (FSS) is an important task for novel (unseen) object segmentation under the data-scarcity scenario. However, most FSS methods rely on unidirectional feature aggregation, e.g., from support prototypes to get the query prediction, and from high-resolution features to guide the low-resolution ones. This usually fails to fully capture the cross-resolution feature relationships and thus leads to inaccurate estimates of the query objects. To resolve the above dilemma, we propose a cyclic memory network (CMN) to directly learn to read abundant support information from all resolution features in a cyclic manner. Specifically, we first generate N pairs (key and value) of multi-resolution query features guided by the support feature and its mask. Next, we circularly take one pair of these features as the query to be segmented, and the rest N-1 pairs are written into an external memory accordingly, i.e., this leave-one-out process is conducted for N times. In each cycle, the query feature is updated by collaboratively matching its key and value with the memory, which can elegantly cover all the spatial locations from different resolutions. Furthermore, we incorporate the query feature re-adding and the query feature recursive updating mechanisms into the memory reading operation. CMN, equipped with these merits, can thus capture cross-resolution relationships and better handle the object appearance and scale variations in FSS. Experiments on PASCAL-5(i) and COCO-20(i) well validate the effectiveness of our model for FSS.

引用

页码：7273 / 7282

页数：10

共 47 条

[21]

Hu T, 2019, AAAI CONF ARTIF INTE, P8441

[22]

Liu Yunfei, 2020, ECCV

[23]

Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

[24]

Miller A, 2016, P 2016 C EMP METH NA, DOI DOI 10.18653/V1/D16-1147

[25] A Read-Write Memory Network for Movie Story Understanding [J].

Na, Seil ;

Lee, Sangho ;

Kim, Jisung ;

Kim, Gunhee .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :677-685

[26]

Rakelly K., 2018, ICLR WORKSH, P1

[27] U-Net: Convolutional Networks for Biomedical Image Segmentation [J].

Ronneberger, Olaf ;

Fischer, Philipp ;

Brox, Thomas .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241

[28]

Seoung Wug Oh, 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Proceedings, P9225, DOI 10.1109/ICCV.2019.00932

[29]

Shaban A., 2017, BRIT MACHINE VISION, DOI 10.5244/C.31.167

[30] AMP: Adaptive Masked Proxies for Few-Shot Segmentation [J].

Siam, Mennatullah ;

Oreshkin, Boris N. ;

Jagersand, Martin .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5248-5257

← 1 2 3 4 5 →