共 39 条
[1]
Bao H., 2021, arXiv
[2]
Emerging Properties in Self-Supervised Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9630-9640
[3]
Deep Clustering for Unsupervised Learning of Visual Features
[J].
COMPUTER VISION - ECCV 2018, PT XIV,
2018, 11218
:139-156
[4]
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:3557-3567
[5]
Chen T, 2020, Arxiv, DOI arXiv:2002.05709
[6]
Chen T, 2020, Arxiv, DOI arXiv:2006.10029
[7]
Chen XL, 2021, Arxiv, DOI arXiv:2104.02057
[8]
VirTex: Learning Visual Representations from Textual Annotations
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:11157-11168
[9]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]