Adder Attention for Vision Transformer

被引：0

作者：

Shu, Han ^{[1
]}

Wang, Jiahao ^{[2
]}

Chen, Hanting ^{[1
,3
]}

Li, Lin ^{[4
]}

Yang, Yujiu ^{[2
]}

Wang, Yunhe ^{[1
]}

机构：

[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China

[2] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[3] Peking Univ, Beijing, Peoples R China

[4] Huawei Technol, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e.g., convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i.e., the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2(similar to)3x reduction on the energy consumption.

引用

页数：11

共 50 条

[31] Vision Transformer Acceleration via a Versatile Attention Optimization Framework [J].

Wang, Xuhang ;

Huang, Qiyue ;

Li, Xing ;

Jiang, Haozhe ;

Xu, Qiang ;

Liang, Xiaoyao ;

Song, Zhuoran .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (06) :2398-2411

[32] FAM: Improving columnar vision transformer with feature attention mechanism [J].

Huang, Lan ;

Bai, Xingyu ;

Zeng, Jia ;

Yu, Mengqiang ;

Pang, Wei ;

Wang, Kangping .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242

[33] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention [J].

Leem, Saebom ;

Seo, Hyunseok .

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, :2956-2964

[34] CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION [J].

Wang, Wenxiao ;

Yao, Lu ;

Chen, Long ;

Lin, Binbin ;

Cai, Deng ;

He, Xiaofei ;

Liu, Wei .

ICLR 2022 - 10th International Conference on Learning Representations, 2022,

[35] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention [J].

Gong, Xiaomei ;

Zhang, Yi ;

Hu, Shu .

KNOWLEDGE-BASED SYSTEMS, 2024, 291

[36] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification [J].

Li, Li ;

Zhuang, Liansheng ;

Gao, Shenghua ;

Wang, Shafei .

COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 :502-517

[37] Patch attention convolutional vision transformer for facial expression recognition with occlusion [J].

Liu, Chang ;

Hirota, Kaoru ;

Dai, Yaping .

INFORMATION SCIENCES, 2023, 619 :781-794

[38] Colorectal Polyp Segmentation Combining Pyramid Vision Transformer and Axial Attention [J].

Zhou, Xue ;

Bai, Zhengyao ;

Lu, Qianjie ;

Fan, Shenglan .

Computer Engineering and Applications, 2023, 59 (11) :222-230

[39] Hierarchical attention vision transformer for fine-grained visual classification [J].

Hu, Xiaobin ;

Zhu, Shining ;

Peng, Taile .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91

[40] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention [J].

Tian, Yuan ;

Zhu, Jingxuan ;

Yao, Huang ;

Chen, Di .

APPLIED SCIENCES-BASEL, 2024, 14 (15)

← 1 2 3 4 5 →