Few-Shot Semantic Segmentation with Cyclic Memory Network

被引：46

作者：

Xie, Guo-Sen ^{[1
,3
]}

Xiong, Huan ^{[1
,4
]}

Liu, Jie ^{[1
]}

Yao, Yazhou ^{[3
]}

Shao, Ling ^{[2
]}

机构：

[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates

[2] Incept Inst AI, Abu Dhabi, U Arab Emirates

[3] Nanjing Univ Sci & Technol, Nanjing, Jiangsu, Peoples R China

[4] Harbin Inst Technol, Harbin, Heilongjiang, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.00720

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot semantic segmentation (FSS) is an important task for novel (unseen) object segmentation under the data-scarcity scenario. However, most FSS methods rely on unidirectional feature aggregation, e.g., from support prototypes to get the query prediction, and from high-resolution features to guide the low-resolution ones. This usually fails to fully capture the cross-resolution feature relationships and thus leads to inaccurate estimates of the query objects. To resolve the above dilemma, we propose a cyclic memory network (CMN) to directly learn to read abundant support information from all resolution features in a cyclic manner. Specifically, we first generate N pairs (key and value) of multi-resolution query features guided by the support feature and its mask. Next, we circularly take one pair of these features as the query to be segmented, and the rest N-1 pairs are written into an external memory accordingly, i.e., this leave-one-out process is conducted for N times. In each cycle, the query feature is updated by collaboratively matching its key and value with the memory, which can elegantly cover all the spatial locations from different resolutions. Furthermore, we incorporate the query feature re-adding and the query feature recursive updating mechanisms into the memory reading operation. CMN, equipped with these merits, can thus capture cross-resolution relationships and better handle the object appearance and scale variations in FSS. Experiments on PASCAL-5(i) and COCO-20(i) well validate the effectiveness of our model for FSS.

引用

页码：7273 / 7282

页数：10

共 47 条

[1]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00543

[2]

[Anonymous], 2018, COMP VIS ECCV 2018 W, DOI DOI 10.1163/9789004385580002

[3]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/ICCV.2019.00071

[4]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00961

[5]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00422

[6]

[Anonymous], 2016, ICML

[7] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[8]

Ballas N., 2015, Delving deeper into convolutional networks for learning video representations

[9]

Chen JH, 2018, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), P2

[10] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

← 1 2 3 4 5 →