DEEPTOOLS: Compiler and Execution Runtime Extensions for RAPiD AI Accelerator

被引:18
作者
Venkataramani, Swagath [1 ]
Choi, Jungwook [1 ]
Srinivasan, Vijayalakshmi [1 ]
Wang, Wei [1 ]
Zhang, Jintao [1 ]
Schaal, Marcel [1 ]
Serrano, Mauricio J. [1 ]
Ishizaki, Kazuaki [1 ]
Inoue, Hiroshi [1 ]
Ogawa, Eri [1 ]
Ohara, Motiyoshi [1 ]
Chang, Leland [1 ]
Gopalakrishnan, Kailash [1 ]
机构
[1] Ibm Res Labs, Yorktown Hts, NY 10598 USA
关键词
Deep Learning; Machine learning accelerators; Software stack for AI;
D O I
10.1109/MM.2019.2931584
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The ubiquitous adoption of systems specialized for AI requires bridging two seemingly conflicting challenges-the need to deliver extreme processing efficiencies while employing familiar programming interfaces, making them compelling even for nonexpert users. We take a significant first step towards this goal and present an end-to-end software stack for the RAPID AI accelerator developed by IBM Research. We present a set of software extensions, called DEEPTOOLS, that leverage and work within popular deep learning frameworks. DEEPTOOLS requires no additional user input and enables aggressive, accelerator-specific performance optimization akin to a full, custom framework. DEEPTOOLS has two key components: 1) a compiler runtime called DeepRT, which automatically identifies how best to execute a given DNN graph on RAPID and constructs the requisite program binaries; and 2) an execution runtime called RAPiDLiB, which triggers and manages the execution of compute and data-transfer operations on RAPID. We integrate DEEPTOOLS with TensorFlow and map popular DNNs (AlexNet, VGG, ResNet, LSTM) to RAPID. We demonstrate substantial improvement in performance over hand-tuned mappings.
引用
收藏
页码:102 / 111
页数:10
相关论文
共 11 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
[3]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[4]  
Cyphers Scott, 2018, ARXIV180108058
[5]  
Fleischer B, 2018, SYMP VLSI CIRCUITS, P35, DOI 10.1109/VLSIC.2018.8502276
[6]   A Configurable Cloud-Scale DNN Processor for Real-Time AI [J].
Fowers, Jeremy ;
Ovtcharov, Kalin ;
Papamichael, Michael ;
Massengill, Todd ;
Liu, Ming ;
Lo, Daniel ;
Alkalay, Shlomi ;
Haselman, Michael ;
Adams, Logan ;
Ghandi, Mahdi ;
Heil, Stephen ;
Patel, Prerak ;
Sapek, Adam ;
Weisz, Gabriel ;
Woods, Lisa ;
Lanka, Sitaram ;
Reinhardt, Steven K. ;
Caulfield, Adrian M. ;
Chung, Eric S. ;
Burger, Doug .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :1-14
[7]   In-Datacenter Performance Analysis of a Tensor Processing Unit [J].
Jouppi, Norman P. ;
Young, Cliff ;
Patil, Nishant ;
Patterson, David ;
Agrawal, Gaurav ;
Bajwa, Raminder ;
Bates, Sarah ;
Bhatia, Suresh ;
Boden, Nan ;
Borchers, Al ;
Boyle, Rick ;
Cantin, Pierre-luc ;
Chao, Clifford ;
Clark, Chris ;
Coriell, Jeremy ;
Daley, Mike ;
Dau, Matt ;
Dean, Jeffrey ;
Gelb, Ben ;
Ghaemmaghami, Tara Vazir ;
Gottipati, Rajendra ;
Gulland, William ;
Hagmann, Robert ;
Ho, C. Richard ;
Hogberg, Doug ;
Hu, John ;
Hundt, Robert ;
Hurt, Dan ;
Ibarz, Julian ;
Jaffey, Aaron ;
Jaworski, Alek ;
Kaplan, Alexander ;
Khaitan, Harshit ;
Killebrew, Daniel ;
Koch, Andy ;
Kumar, Naveen ;
Lacy, Steve ;
Laudon, James ;
Law, James ;
Le, Diemthu ;
Leary, Chris ;
Liu, Zhuyuan ;
Lucke, Kyle ;
Lundin, Alan ;
MacKean, Gordon ;
Maggiore, Adriana ;
Mahony, Maire ;
Miller, Kieran ;
Nagarajan, Rahul ;
Narayanaswami, Ravi .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12
[8]   A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel [J].
Ogawa, Eri ;
Ishizaki, Kazuaki ;
Inoue, Hiroshi ;
Venkataramani, Swagath ;
Choi, Jungwook ;
Wang, Wei ;
Srinivasan, Vijayalakshmi ;
Ohara, Moriyoshi ;
Gopalakrishnan, Kailash .
2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22), 2019,
[9]  
Rotem N., 2018, GLOW GRAPH LOWERING
[10]   SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks [J].
Venkataramani, Swagath ;
Ranjan, Ashish ;
Banerjee, Subarno ;
Das, Dipankar ;
Avancha, Sasikanth ;
Jagannathan, Ashok ;
Durg, Ajaya ;
Nagaraj, Dheemanth ;
Kaul, Bharat ;
Dubey, Pradeep ;
Raghunathan, Anand .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :13-26