Fast Vision Transformer via Additive Attention

被引：0

作者：

Wen, Yang ^{[1
]}

Chen, Samuel ^{[2
]}

Shrestha, Abhishek Krishna ^{[2
]}

机构：

[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen, Peoples R China

[2] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China

来源：

2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Fast Vision Transformer; Additive Attention;

D O I：

10.1109/CAI59869.2024.00113

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Vision Transformer has been a more effective architecture for computer vision tasks than convolutional neural networks (CNN). However, it is time-consuming due to the quadratic complexity of the input sequence length. In this paper, a Fast Vision Transformer (FViT) is proposed based on an additive attention module, which reduces computation complexity to linearity. The experiment results show that the proposed model achieves faster inference with less memory.

引用

页码：573 / 574

页数：2

共 6 条

[1]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[2]

Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021

[3] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[4] A Survey of Visual Transformers [J].

Liu, Yang ;

Zhang, Yao ;

Wang, Yixin ;

Hou, Feng ;

Yuan, Jin ;

Tian, Jiang ;

Zhang, Yang ;

Shi, Zhongchao ;

Fan, Jianping ;

He, Zhiqiang .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) :7478-7498

[5]

Vaswani A, 2017, ADV NEUR IN, V30

[6]

Wu Chuhan, 2021, arXiv

← 1 →