FLatten Transformer: Vision Transformer using Focused Linear Attention

被引：121

作者：

Han, Dongchen ^{[1
]}

Pan, Xuran ^{[1
]}

Han, Yizeng ^{[1
]}

Song, Shiji ^{[1
]}

Huang, Gao ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, BNRist, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.00548

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github. com/LeapLabTHU/FLatten-Transformer.

引用

页码：5938 / 5948

页数：11

共 61 条

[1]

[Anonymous], 2022, P IEEE CVF C COMP VI, DOI DOI 10.1109/IGARSS46834.2022.9883508

[2]

Bolya Daniel, 2023, Computer Vision - ECCV 2022 Workshops, Proceedings. Lecture Notes in Computer Science (13807), P35, DOI 10.1007/978-3-031-25082-8_3

[3]

Cai Han, 2022, ARXIV220514756

[4] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6]

Cheng B., 2021, Adv. Neural Inform. Process. Syst., V34, P17864

[7]

Choromanski K., 2021, ICLR

[8] Randaugment: Practical automated data augmentation with a reduced search space [J].

Cubuk, Ekin D. ;

Zoph, Barret ;

Shlens, Jonathon ;

Le, Quoc, V .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :3008-3017

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Dosovitskiy A., 2020, ICLR 2021

← 1 2 3 4 5 6 7 →