Real-time Transformer Inference on Edge AI Accelerators

被引:1
作者
Reidy, Brendan [1 ]
Mohammadi, Mohammadreza [1 ]
Elbtity, Mohammed [1 ]
Smith, Heath [1 ]
Zand, Ramtin [1 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
来源
2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS | 2023年
关键词
Tensor Processing Unit (TPU); Transformer Models; Edge AI Accelerators; BERT;
D O I
10.1109/RTAS58335.2023.00036
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers have shown remarkable results and established a new state-of-the-art in many domains. However, this increase in performance has come at the cost of ever-increasing model sizes requiring more resources to deploy. Machine learning (ML) models are used in many real-world systems, such as robotics, mobile devices, and internet of things (IoT) devices, that require fast inference with low energy consumption. For battery-powered devices, lower energy consumption directly translates into longer battery life. To address these issues, several edge AI accelerators have been developed. Among these, the Coral Edge TPU has shown promising results for image classification while maintaining very low energy consumption. Many of these devices, including the Coral TPU, were originally designed to accelerate convolutional neural networks, making deployment of Transformers challenging. Here, we propose a methodology to deploy Transformers on Edge TPU. We provide extensive latency, power, and energy comparisons among the leading edge devices and show that our methodology allows for real-time inference of Transformers while maintaining the lowest power and energy consumption of other edge devices on the market.
引用
收藏
页码:341 / 344
页数:4
相关论文
共 10 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Brown TB, 2020, ADV NEUR IN, V33
[3]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[4]  
Dolan W.B., 2005, 3 INT WORKSH PAR
[5]  
Google. Coral AI, 2020, EDG TPU INF OV
[6]  
Hendrycks D, 2020, Arxiv, DOI arXiv:1606.08415
[7]  
Kim Sehoon, 2021, Proceedings of Machine Learning Research, V139
[8]  
Srivastava N, 2014, J MACH LEARN RES, V15, P1929
[9]  
Vaswani A, 2017, ADV NEUR IN, V30
[10]  
Yu JH, 2022, Arxiv, DOI arXiv:2205.01917