Combined scaling for zero-shot transfer learning

被引：23

作者：

Pham, Hieu ^{[1
]}

Dai, Zihang ^{[1
]}

Ghiasi, Golnaz ^{[1
]}

Kawaguchi, Kenji ^{[2
]}

Liu, Hanxiao ^{[1
]}

Yu, Adams Wei ^{[1
]}

Yu, Jiahui ^{[1
]}

Chen, Yi-Ting ^{[1
]}

Luong, Minh-Thang ^{[1
]}

Wu, Yonghui ^{[1
]}

Tan, Mingxing ^{[1
]}

V. Le, Quoc ^{[1
]}

机构：

[1] Brain Team, Google Res, Mountain View, CA USA

[2] Harvard Univ, Cambridge, MA 02138 USA

来源：

NEUROCOMPUTING | 2023年 / 555卷

关键词：

Deep learning; Computer vision; Deep neural networks; Zero-shot transfer; INFORMED NEURAL-NETWORKS; MODELS;

D O I：

10.1016/j.neucom.2023.126658

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent developments in multimodal training methodologies, including CLIP and ALIGN, obviate the necessity for individual data labeling. These approaches utilize pairs of data and corresponding textual information found online as a form of weak supervision signal. However, models employing this kind of weak supervision are not as competitive as their supervised and semi-supervised counterparts when sufficient labeled data is accessible. This performance gap constrains the applicability of weekly supervised models. In this paper, we narrow the gap by proposing a combined scaling method, named BASIC, that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best-published similar models, CLIP and ALIGN, by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution shifts such as ImageNet-{A,R,V2,Sketch} and ObjectNet, our model achieves 84.3% top-1 average accuracy, only a small drop from its original ImageNet accuracy. To achieve these results, we first develop a theoretical framework which shows that larger contrastive batch sizes lead to smaller generalization gaps for image-text models such as CLIP and ALIGN. Based on this theoretical result, we scale up the contrastive learning framework of CLIP and ALIGN in three dimensions (data size, model size, and batch size) by proposing a new method using gradient checkpointing and model parallelism. As a result, our dataset has 6.6B noisy image-text pairs, which is 4x larger than ALIGN, and 16x larger than CLIP. Our largest model has 3B weights, which is 3.75x larger in parameters and 8x larger in FLOPs than ALIGN and CLIP. Finally, our batch size is 65536 which is 2x more than CLIP and 4x more than ALIGN.

引用

页数：23

共 50 条

[1] Bangla Sign alphabet recognition with zero-shot and transfer learning
Nihal, Ragib Amin
Rahman, Sejuti
Broti, Nawara Mahmood
Deowan, Shamim Ahmed
PATTERN RECOGNITION LETTERS, 2021, 150 : 84 - 93
[2] Zero-Shot Transfer Learning Based on Visual and Textual Resemblance
Yang, Gang
Xu, Jieping
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 353 - 362
[3] Zero-shot Learning With Fuzzy Attribute
Liu, Chongwen
Shang, Zhaowei
Tang, Yuan Yan
2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 277 - 282
[4] Variational Disentangle Zero-Shot Learning
Su, Jie
Wan, Jinhao
Li, Taotao
Li, Xiong
Ye, Yuheng
MATHEMATICS, 2023, 11 (16)
[5] Zero-Shot Transfer Learning Framework for Plant Leaf Disease Classification
Satya Rajendra Singh, R.
Sanodiya, Rakesh Kumar
IEEE ACCESS, 2023, 11 : 143861 - 143880
[6] Zero-Shot Transfer Learning of a Throwing Task via Domain Randomization
Park, Sungyong
Kim, Jigang
Kim, H. Jin
2020 20TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2020, : 1026 - 1030
[7] Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning
Fu-En Yang
Yuan-Hao Lee
Chia-Ching Lin
Yu-Chiang Frank Wang
International Journal of Computer Vision, 2023, 131 (6) : 1331 - 1345
[8] Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning
Yang, Fu-En
Lee, Yuan-Hao
Lin, Chia-Ching
Wang, Yu-Chiang Frank
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (06) : 1331 - 1345
[9] Fabric Recognition Using Zero-Shot Learning
Wang, Feng
Liu, Huaping
Sun, Fuchun
Pan, Haihong
TSINGHUA SCIENCE AND TECHNOLOGY, 2019, 24 (06) : 645 - 653
[10] Fabric Recognition Using Zero-Shot Learning
Feng Wang
Huaping Liu
Fuchun Sun
Haihong Pan
Tsinghua Science and Technology, 2019, 24 (06) : 645 - 653

← 1 2 3 4 5 →