ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training

被引:318
作者
Touvron, Hugo [1 ,2 ]
Bojanowski, Piotr [1 ]
Caron, Mathilde [1 ]
Cord, Matthieu [2 ]
El-Nouby, Alaaeldin [1 ]
Grave, Edouard [1 ]
Izacard, Gautier [1 ]
Joulin, Armand [1 ]
Synnaeve, Gabriel [1 ]
Verbeek, Jakob [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI Res, F-75004 Paris, France
[2] Sorbonne Univ, F-75006 Paris, France
关键词
Transformers; Training; Computer architecture; Machine translation; Decoding; Task analysis; Knowledge engineering; Multi-layer perceptron; computer-vision; NLP;
D O I
10.1109/TPAMI.2022.3206148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.
引用
收藏
页码:5314 / 5321
页数:8
相关论文
共 72 条
[1]  
Ba J L., LAYER NORMALIZATION
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Bao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254
[4]  
Bello I, 2021, ADV NEUR IN, V34
[5]  
Beyer L, 2020, Arxiv, DOI arXiv:2006.07159
[6]  
Bluche T., 2015, THESIS U PARIS SUD B
[7]  
Brock A, 2021, Arxiv, DOI [arXiv:2102.06171, 10.48550/arXiv.2102.06171]
[8]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[9]  
Chatelain C., 2006, Extraction de sequences numeriques dans des documents manuscrits quelconques
[10]  
Chen MX, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P76