NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models

被引:4
作者
Kim, Joonsung [1 ]
Hur, Suyeon [1 ]
Lee, Eunbok [1 ]
Lee, Seungho [1 ]
Kim, Jangwoo [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
来源
30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021) | 2021年
基金
新加坡国家研究基金会;
关键词
Architecture; Computation/Dataflow Optimization; Natural Language Processing (NLP); Parallel Algorithm; SPACE EXPLORATION FRAMEWORK; TIME;
D O I
10.1109/PACT52795.2021.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emerging natural language processing (NLP) models have become more complex and bigger to provide more sophisticated NLP services. Accordingly, there is also a strong demand for scalable and flexible computer infrastructure to support these large-scale, complex, and diverse NLP models. However, existing proposals cannot provide enough scalability and flexibility as they neither identify nor optimize a wide spectrum of performance-critical operations appearing in recent NLP models and only focus on optimizing specific operations. In this paper, we propose NLP-Fast, a novel system solution to accelerate a wide spectrum of large-scale NLP models. NLP-Fast mainly consists of two parts: (1) NLP-Perf : an in-depth performance analysis tool to identify critical operations in emerging NLP models and (2) NLP-Opt: three end-to-end optimization techniques to accelerate the identified performance-critical operations on various hardware platforms (e.g., CPU, GPU, FPGA). In this way, NLP-Fast can accelerate various types of NLP models on different hardware platforms by identifying their critical operations through NLP-Perf and applying the NLP-Opt's holistic optimizations. We evaluate NLP-Fast on CPU, GPU, and FPGA, and the overall throughputs are increased by up to 2.92x, 1.59x, and 4.47x over each platform's baseline. We release NLP-Fast to the community so that users are easily able to conduct the NLP-Fast's analysis and apply NLP-Fast's optimizations for their own NLP applications.
引用
收藏
页码:75 / 89
页数:15
相关论文
共 72 条
[1]   SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks [J].
Akhlaghi, Vahideh ;
Yazdanbakhsh, Amir ;
Samadi, Kambiz ;
Gupta, Rajesh K. ;
Esmaeilzadeh, Hadi .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :662-673
[2]  
[Anonymous], 2019, Advances in Neural Information Processing Systems (NeurIPS)
[3]  
[Anonymous], ARXIV150602075
[4]  
[Anonymous], 2018, SYSML C
[5]   A GA-based design space exploration framework for parameterized system-on-a-chip platforms [J].
Ascia, G ;
Catania, V ;
Palesi, M .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2004, 8 (04) :329-346
[6]  
Boyapati R, 2017, 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), P666, DOI [10.1145/3140659.3080241, 10.1145/3079856.3080241]
[7]  
Calborean H, 2010, 9TH ROEDUNET IEEE INTERNATIONAL CONFERENCE, P202
[8]  
Chang A. X. M, 2017, INT CONF ADVAN COMPU, P1
[9]   PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].
Chi, Ping ;
Li, Shuangchen ;
Xu, Cong ;
Zhang, Tao ;
Zhao, Jishen ;
Liu, Yongpan ;
Wang, Yu ;
Xie, Yuan .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39
[10]   Serving DNNs in Real Time at Datacenter Scale with Project Brainwave [J].
Chung, Eric ;
Fowers, Jeremy ;
Ovtcharov, Kalin ;
Papamichael, Michael ;
Caulfield, Adrian ;
Massengill, Todd ;
Liu, Ming ;
Lo, Daniel ;
Alkalay, Shlomi ;
Haselman, Michael ;
Abeydeera, Maleen ;
Adams, Logan ;
Angepat, Hari ;
Boehn, Christian ;
Chiou, Derek ;
Firestein, Oren ;
Forin, Alessandro ;
Gatlin, Kang Su ;
Ghandi, Mahdi ;
Heil, Stephen ;
Holohan, Kyle ;
El Husseini, Ahmad ;
Juhasz, Tamas ;
Kagi, Kara ;
Kovvuri, Ratna K. ;
Lanka, Sitaram ;
van Megen, Friedel ;
Mukhortov, Dima ;
Patel, Prerak ;
Perez, Brandon ;
Rapsang, Amanda Grace ;
Reinhardt, Steven K. ;
Rouhani, Bita Darvish ;
Sapek, Adam ;
Seera, Raja ;
Shekar, Sangeetha ;
Sridharan, Balaji ;
Weisz, Gabriel ;
Woods, Lisa ;
Xiao, Phillip Yi ;
Zhang, Dan ;
Zhao, Ritchie ;
Burger, Doug .
IEEE MICRO, 2018, 38 (02) :8-20