Dense Semantic Contrast for Self-Supervised Visual Representation Learning

被引：21

作者：

Li, Xiaoni ^{[1
,2
]}

Zhou, Yu ^{[1
,2
]}

Zhang, Yifei ^{[1
,2
]}

Zhang, Aoting ^{[1
]}

Wang, Wei ^{[1
,2
]}

Jiang, Ning ^{[3
]}

Wu, Haiying ^{[3
]}

Wang, Weiping ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

Self-Supervised Learning; Representation Learning; Contrastive; Learning; Dense Representation; Semantics Discovery;

D O I：

10.1145/3474085.3475551

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised representation learning for visual pre-training has achieved remarkable success with sample (instance or pixel) discrimination and semantics discovery of instance, whereas there still exists a non-negligible gap between pre-trained model and downstream dense prediction tasks. Concretely, these downstream tasks require more accurate representation, in other words, the pixels from the same object must belong to a shared semantic category, which is lacking in the previous methods. In this work, we present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level to meet the requirement of these tasks. Furthermore, we propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning. Specially, we explicitly explore the semantic structure of the dataset by mining relations among pixels from different perspectives. For intra-image relation modeling, we discover pixel neighbors from multiple views. And for inter-image relations, we enforce pixel representation from the same semantic class to be more similar than the representation from different classes in one mini-batch. Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks, including object detection, semantic segmentation, and instance segmentation. Code will be made available.

引用

页码：1368 / 1376

页数：9

共 50 条

[31] Semantic-Aware Auto-Encoders for Self-supervised Representation Learning
Wang, Guangrun
Tang, Yansong
Lin, Liang
Torr, Philip H. S.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9654 - 9665
[32] Semantic Segmentation of Remote Sensing Images With Self-Supervised Multitask Representation Learning
Li, Wenyuan
Chen, Hao
Shi, Zhenwei
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 6438 - 6450
[33] CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation Learning
Luo, Xiao
Ju, Wei
Qu, Meng
Gu, Yiyang
Chen, Chong
Deng, Minghua
Hua, Xian-Sheng
Zhang, Ming
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 899 - 912
[34] Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning
Jiao, Yizhu
Xiong, Yun
Zhang, Jiawei
Zhang, Yao
Zhang, Tianqi
Zhu, Yangyong
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 222 - 231
[35] Self-Supervised Dynamic Graph Representation Learning via Temporal Subgraph Contrast
Chen, Ke-Jia
Liu, Linsong
Jiang, Linpu
Chen, Jingqiang
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (01)
[36] Semantic Pose Verification for Outdoor Visual Localization with Self-supervised Contrastive Learning
Orhan, Semih
Guerrero, Jose J.
Bastanlar, Yalin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3988 - 3997
[37] Self-Supervised Embodied Learning for Semantic Segmentation
Wang, Juan
Liu, Xinzhu
Zhao, Dawei
Dai, Bin
Liu, Huaping
2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 383 - 390
[38] Self-Distilled Self-supervised Representation Learning
Jang, Jiho
Kim, Seonhoon
Yoo, Kiyoon
Kong, Chaerin
Kim, Jangho
Kwak, Nojun
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2828 - 2838
[39] Towards Latent Masked Image Modeling for Self-supervised Visual Representation Learning
Wei, Yibing
Gupta, Abhinav
Morgado, Pedro
COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 1 - 17
[40] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
Turrisi da Costa, Victor G.
Fini, Enrico
Nabi, Moin
Sebe, Nicu
Ricci, Elisa
Journal of Machine Learning Research, 2022, 23

← 1 2 3 4 5 →