Adder Attention for Vision Transformer

被引：0

作者：

Shu, Han ^{[1
]}

Wang, Jiahao ^{[2
]}

Chen, Hanting ^{[1
,3
]}

Li, Lin ^{[4
]}

Yang, Yujiu ^{[2
]}

Wang, Yunhe ^{[1
]}

机构：

[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China

[2] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[3] Peking Univ, Beijing, Peoples R China

[4] Huawei Technol, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e.g., convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i.e., the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2(similar to)3x reduction on the energy consumption.

引用

收藏

页数：11

相关论文

共 50 条

[11] Attention combined pyramid vision transformer for polyp segmentation [J].

Liu, Xiaogang ;

Song, Shuang .

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89

[12] EAViT: External Attention Vision Transformer for Audio Classification [J].

Iqbal, Aquib ;

Zim, Abid Hasan ;

Tonmoy, Md Asaduzzaman ;

Zhou, Limengnan ;

Malik, Asad ;

Kuribayashi, Minoru .

2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,

[13] Local Window Attention Vision Transformer for Mammogram Classification [J].

Sreekala, Kaiprappady Kumaran ;

Sahoo, Jayakrushna .

IETE JOURNAL OF RESEARCH, 2025,

[14] PARAMETER-EFFICIENT VISION TRANSFORMER WITH LINEAR ATTENTION [J].

Zhao, Youpeng ;

Tang, Huadong ;

Jiang, Yingying ;

Yong, A. ;

Wu, Qiang ;

Wang, Jun .

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, :1275-1279

[15] Fcaformer: Forward Cross Attention in Hybrid Vision Transformer [J].

Zhang, Haokui ;

Hu, Wenze ;

Wang, Xiaoyu .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :6037-6046

[16] BViT: Broad Attention-Based Vision Transformer [J].

Li, Nannan ;

Chen, Yaran ;

Li, Weifan ;

Ding, Zixiang ;

Zhao, Dongbin ;

Nie, Shuai .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) :12772-12783

[17] Efficient image analysis with triple attention vision transformer [J].

Li, Gehui ;

Zhao, Tongtong .

PATTERN RECOGNITION, 2024, 150

[18] CONMW TRANSFORMER: A GENERAL VISION TRANSFORMER BACKBONE WITH MERGED-WINDOW ATTENTION [J].

Li, Ang ;

Jiao, Jichao ;

Li, Ning ;

Qi, Wangjing ;

Xu, Wei ;

Pang, Min .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :1551-1555

[19] Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention [J].

Liu, Kai ;

Wu, Tianyi ;

Liu, Cong ;

Guo, Guodong .

PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, :1187-1193

[20] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention [J].

Pan, Xuran ;

Ye, Tianzhu ;

Xia, Zhuofan ;

Song, Shiji ;

Huang, Gao .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2082-2091

← 1 2 3 4 5 →