NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models

被引：4

作者：

Kim, Joonsung ^{[1
]}

Hur, Suyeon ^{[1
]}

Lee, Eunbok ^{[1
]}

Lee, Seungho ^{[1
]}

Kim, Jangwoo ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea

来源：

30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021) | 2021年

基金：

新加坡国家研究基金会;

关键词：

Architecture; Computation/Dataflow Optimization; Natural Language Processing (NLP); Parallel Algorithm; SPACE EXPLORATION FRAMEWORK; TIME;

D O I：

10.1109/PACT52795.2021.00013

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging natural language processing (NLP) models have become more complex and bigger to provide more sophisticated NLP services. Accordingly, there is also a strong demand for scalable and flexible computer infrastructure to support these large-scale, complex, and diverse NLP models. However, existing proposals cannot provide enough scalability and flexibility as they neither identify nor optimize a wide spectrum of performance-critical operations appearing in recent NLP models and only focus on optimizing specific operations. In this paper, we propose NLP-Fast, a novel system solution to accelerate a wide spectrum of large-scale NLP models. NLP-Fast mainly consists of two parts: (1) NLP-Perf : an in-depth performance analysis tool to identify critical operations in emerging NLP models and (2) NLP-Opt: three end-to-end optimization techniques to accelerate the identified performance-critical operations on various hardware platforms (e.g., CPU, GPU, FPGA). In this way, NLP-Fast can accelerate various types of NLP models on different hardware platforms by identifying their critical operations through NLP-Perf and applying the NLP-Opt's holistic optimizations. We evaluate NLP-Fast on CPU, GPU, and FPGA, and the overall throughputs are increased by up to 2.92x, 1.59x, and 4.47x over each platform's baseline. We release NLP-Fast to the community so that users are easily able to conduct the NLP-Fast's analysis and apply NLP-Fast's optimizations for their own NLP applications.

引用

页码：75 / 89

页数：15

共 72 条

[1] SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks [J].

Akhlaghi, Vahideh ;

Yazdanbakhsh, Amir ;

Samadi, Kambiz ;

Gupta, Rajesh K. ;

Esmaeilzadeh, Hadi .

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :662-673

[2]

[Anonymous], 2019, Advances in Neural Information Processing Systems (NeurIPS)

[3]

[Anonymous], ARXIV150602075

[4]

[Anonymous], 2018, SYSML C

[5] A GA-based design space exploration framework for parameterized system-on-a-chip platforms [J].

Ascia, G ;

Catania, V ;

Palesi, M .

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2004, 8 (04) :329-346

[6]

Boyapati R, 2017, 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), P666, DOI [10.1145/3140659.3080241, 10.1145/3079856.3080241]

[7]

Calborean H, 2010, 9TH ROEDUNET IEEE INTERNATIONAL CONFERENCE, P202

[8]

Chang A. X. M, 2017, INT CONF ADVAN COMPU, P1

[9] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].

Chi, Ping ;

Li, Shuangchen ;

Xu, Cong ;

Zhang, Tao ;

Zhao, Jishen ;

Liu, Yongpan ;

Wang, Yu ;

Xie, Yuan .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39

[10] Serving DNNs in Real Time at Datacenter Scale with Project Brainwave [J].

Chung, Eric ;

Fowers, Jeremy ;

Ovtcharov, Kalin ;

Papamichael, Michael ;

Caulfield, Adrian ;

Massengill, Todd ;

Liu, Ming ;

Lo, Daniel ;

Alkalay, Shlomi ;

Haselman, Michael ;

Abeydeera, Maleen ;

Adams, Logan ;

Angepat, Hari ;

Boehn, Christian ;

Chiou, Derek ;

Firestein, Oren ;

Forin, Alessandro ;

Gatlin, Kang Su ;

Ghandi, Mahdi ;

Heil, Stephen ;

Holohan, Kyle ;

El Husseini, Ahmad ;

Juhasz, Tamas ;

Kagi, Kara ;

Kovvuri, Ratna K. ;

Lanka, Sitaram ;

van Megen, Friedel ;

Mukhortov, Dima ;

Patel, Prerak ;

Perez, Brandon ;

Rapsang, Amanda Grace ;

Reinhardt, Steven K. ;

Rouhani, Bita Darvish ;

Sapek, Adam ;

Seera, Raja ;

Shekar, Sangeetha ;

Sridharan, Balaji ;

Weisz, Gabriel ;

Woods, Lisa ;

Xiao, Phillip Yi ;

Zhang, Dan ;

Zhao, Ritchie ;

Burger, Doug .

IEEE MICRO, 2018, 38 (02) :8-20

← 1 2 3 4 5 6 7 8 →