Fast Vision Transformer via Additive Attention

被引:0
作者
Wen, Yang [1 ]
Chen, Samuel [2 ]
Shrestha, Abhishek Krishna [2 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen, Peoples R China
[2] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China
来源
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Fast Vision Transformer; Additive Attention;
D O I
10.1109/CAI59869.2024.00113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Vision Transformer has been a more effective architecture for computer vision tasks than convolutional neural networks (CNN). However, it is time-consuming due to the quadratic complexity of the input sequence length. In this paper, a Fast Vision Transformer (FViT) is proposed based on an additive attention module, which reduces computation complexity to linearity. The experiment results show that the proposed model achieves faster inference with less memory.
引用
收藏
页码:573 / 574
页数:2
相关论文
共 6 条
[1]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[2]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021
[3]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[4]   A Survey of Visual Transformers [J].
Liu, Yang ;
Zhang, Yao ;
Wang, Yixin ;
Hou, Feng ;
Yuan, Jin ;
Tian, Jiang ;
Zhang, Yang ;
Shi, Zhongchao ;
Fan, Jianping ;
He, Zhiqiang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) :7478-7498
[5]  
Vaswani A, 2017, ADV NEUR IN, V30
[6]  
Wu Chuhan, 2021, arXiv