Combined scaling for zero-shot transfer learning

被引:22
|
作者
Pham, Hieu [1 ]
Dai, Zihang [1 ]
Ghiasi, Golnaz [1 ]
Kawaguchi, Kenji [2 ]
Liu, Hanxiao [1 ]
Yu, Adams Wei [1 ]
Yu, Jiahui [1 ]
Chen, Yi-Ting [1 ]
Luong, Minh-Thang [1 ]
Wu, Yonghui [1 ]
Tan, Mingxing [1 ]
V. Le, Quoc [1 ]
机构
[1] Brain Team, Google Res, Mountain View, CA USA
[2] Harvard Univ, Cambridge, MA 02138 USA
关键词
Deep learning; Computer vision; Deep neural networks; Zero-shot transfer; INFORMED NEURAL-NETWORKS; MODELS;
D O I
10.1016/j.neucom.2023.126658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent developments in multimodal training methodologies, including CLIP and ALIGN, obviate the necessity for individual data labeling. These approaches utilize pairs of data and corresponding textual information found online as a form of weak supervision signal. However, models employing this kind of weak supervision are not as competitive as their supervised and semi-supervised counterparts when sufficient labeled data is accessible. This performance gap constrains the applicability of weekly supervised models. In this paper, we narrow the gap by proposing a combined scaling method, named BASIC, that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best-published similar models, CLIP and ALIGN, by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution shifts such as ImageNet-{A,R,V2,Sketch} and ObjectNet, our model achieves 84.3% top-1 average accuracy, only a small drop from its original ImageNet accuracy. To achieve these results, we first develop a theoretical framework which shows that larger contrastive batch sizes lead to smaller generalization gaps for image-text models such as CLIP and ALIGN. Based on this theoretical result, we scale up the contrastive learning framework of CLIP and ALIGN in three dimensions (data size, model size, and batch size) by proposing a new method using gradient checkpointing and model parallelism. As a result, our dataset has 6.6B noisy image-text pairs, which is 4x larger than ALIGN, and 16x larger than CLIP. Our largest model has 3B weights, which is 3.75x larger in parameters and 8x larger in FLOPs than ALIGN and CLIP. Finally, our batch size is 65536 which is 2x more than CLIP and 4x more than ALIGN.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Relational Knowledge Transfer for Zero-Shot Learning
    Wang, Donghui
    Li, Yanan
    Lin, Yuetan
    Zhuang, Yueting
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2145 - 2151
  • [2] Hypernetworks for Zero-Shot Transfer in Reinforcement Learning
    Rezaei-Shoshtari, Sahand
    Morissette, Charlotte
    Hogan, Francois R.
    Dudek, Gregory
    Meger, David
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9579 - 9587
  • [3] Zero-Shot Transfer Learning for Event Extraction
    Huang, Lifu
    Ji, Heng
    Cho, Kyunghyun
    Dagan, Ido
    Riedel, Sebastian
    Voss, Clare R.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2160 - 2170
  • [4] Transfer Increment for Generalized Zero-Shot Learning
    Feng, Liangjun
    Zhao, Chunhui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2506 - 2520
  • [5] DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
    Higgins, Irina
    Pal, Arka
    Rusu, Andrei
    Matthey, Loic
    Burgess, Christopher
    Pritzel, Alexander
    Botyinick, Matthew
    Blundell, Charles
    Lerchner, Alexander
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] Constrained GPI for Zero-Shot Transfer in Reinforcement Learning
    Kim, Jaekyeom
    Park, Seohong
    Kim, Gunhee
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Structurally Constrained Correlation Transfer for Zero-shot Learning
    Chen, Yu
    Xiong, Yuehan
    Gao, Xing
    Xiong, Hongkai
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [8] Constrained GPI for Zero-Shot Transfer in Reinforcement Learning
    Kim, Jaekyeom
    Park, Seohong
    Kim, Gunhee
    Advances in Neural Information Processing Systems, 2022, 35
  • [9] Zero-shot Learning via Recurrent Knowledge Transfer
    Zhao, Bo
    Sun, Xinwei
    Hong, Xiaopeng
    Yao, Yuan
    Wang, Yizhou
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1308 - 1317
  • [10] Deep Unbiased Embedding Transfer for Zero-Shot Learning
    Jia, Zhen
    Zhang, Zhang
    Wang, Liang
    Shan, Caifeng
    Tan, Tieniu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1958 - 1971