ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training

被引：318

作者：

Touvron, Hugo ^{[1
,2
]}

Bojanowski, Piotr ^{[1
]}

Caron, Mathilde ^{[1
]}

Cord, Matthieu ^{[2
]}

El-Nouby, Alaaeldin ^{[1
]}

Grave, Edouard ^{[1
]}

Izacard, Gautier ^{[1
]}

Joulin, Armand ^{[1
]}

Synnaeve, Gabriel ^{[1
]}

Verbeek, Jakob ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI Res, F-75004 Paris, France

[2] Sorbonne Univ, F-75006 Paris, France

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 04期

关键词：

Transformers; Training; Computer architecture; Machine translation; Decoding; Task analysis; Knowledge engineering; Multi-layer perceptron; computer-vision; NLP;

D O I：

10.1109/TPAMI.2022.3206148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.

引用

页码：5314 / 5321

页数：8

共 72 条

[1]

Ba J L., LAYER NORMALIZATION

[2]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[3]

Bao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254

[4]

Bello I, 2021, ADV NEUR IN, V34

[5]

Beyer L, 2020, Arxiv, DOI arXiv:2006.07159

[6]

Bluche T., 2015, THESIS U PARIS SUD B

[7]

Brock A, 2021, Arxiv, DOI [arXiv:2102.06171, 10.48550/arXiv.2102.06171]

[8] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[9]

Chatelain C., 2006, Extraction de sequences numeriques dans des documents manuscrits quelconques

[10]

Chen MX, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P76

← 1 2 3 4 5 6 7 8 →