Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

被引：0

作者：

Shen, Zhiqiang ^{[1
,4
]}

Liu, Zechun ^{[2
]}

Liu, Zhuang ^{[3
]}

Savvides, Marios ^{[1
]}

Darrell, Trevor ^{[3
]}

Xing, Eric ^{[1
,4
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Meta Inc, Real Labs, Menlo Pk, CA USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

[4] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the soft distance concept on label space in the contrastive-based unsupervised learning task and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution - Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet-1K with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1-3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github.com/szq0214/Un-Mix.

引用

页码：2216 / 2224

页数：9

共 52 条

[1]

[Anonymous], 2020, ARXIV200205709

[2]

Bachman P, 2019, ADV NEUR IN, V32

[3]

Caron M, 2020, ADV NEUR IN, V33

[4] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

[5]

Chen XL, 2020, Arxiv, DOI arXiv:2003.04297

[6] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[7]

Chorowski J., 2016, INTERSPEECH

[8]

Coates Adam, 2011, JMLR Workshop and Conference Proceedings, P215

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Donahue J., 2016, ICLR

← 1 2 3 4 5 6 →