The Progress and Trends of FPGA-Based Accelerators in Deep Learning

被引:0
作者
Wu Y.-X. [1 ]
Liang K. [1 ,2 ]
Liu Y. [2 ]
Cui H.-M. [2 ]
机构
[1] College of Computer Science and Technology, Harbin Engineering University, Harbin
[2] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2019年 / 42卷 / 11期
关键词
CPU-FPGA; Deep learning; FPGA; Hardware accelerator; Neural network;
D O I
10.11897/SP.J.1016.2019.02461
中图分类号
学科分类号
摘要
With the coming of big data era, the technology of deep learning plays a critical role in extracting the meaningful information from the massive data. Also, it has been widely applied in some domains, such as computer vision, speech recognition and natural language processing. As far as we all know, deep learning algorithms have a large number of parameters and relevant matrix multiplication or multiply-and-add operations with the simple computing model. At the same time, researches and industries need higher and higher accuracy, the complex models which have more and more weights won the image classification and object detection contest. To speed up the inference and training of deep learning becomes much more important. This paper mainly reviews one of the approaches, accelerating deep learning on FPGAs. Firstly, this paper introduces the deep learning algorithms and relative characteristics especially the convolutional neural network and recurrent neural network and why the FPGA approach can fit this problem, what people can be benefit compared with CPU only or GPU and what people should do to enable it. After that, this paper analyzes the challenges of accelerating deep learning on FPGAs. Additionally, the CPU-FPGA is the most welcome architecture at present, but there're different methods to set up a usable platform. And for CPU-FPGA platforms,data communication is one of the important factors for affecting the performance of acceleration. Based on these, this paper introduces different methods from the aspects of SoC FPGA and standard FPGA, and comparatively analyzes the differences of the data communication between CPU and FPGA of the two methods. For different engineers and developers, the development environments and tools of accelerating deep learning on FPGAs are presented in this paper from the aspects of high-level language,such as C and OpenCL, and hardware description language, such as Verilog. It is not hard to use techniques which require many complex low-level hardware control operations for FPGA implementations to improve performance by using hardware description. But that always requires a working knowledge of digital design and circuits. In contrary to hardware description language, using high-level language allows engineers and developers to have more freedom to explore algorithm instead of designing hardware architectures and it is suitable for researchers who need quickly iterate through design cycles and software developers who have no knowledge of digital design. And on the basis of it, the FPGA-based accelerator for deep learning algorithms is reviewed from the hardware architecture, design methods and optimization strategy in detail. Three typic hardware architectures are presented for convolutional neural network, and hardware architectures of two typic models for recurrent neural network are introduced. Design methods are reviewed from the goal of accelerator, the way to build model of algorithms and how to find the best solution. As for optimization strategy, it is presented from the processer and memory communication, which are critical factors for improving performance of Accelerator. Finally, this paper prospects the research on FPGA-accelerated deep learning algorithms from the aspects of the development environments, process communications between CPU and FPGA, better compression model methods and cloud application with FPGA. © 2019, Science Press. All right reserved.
引用
收藏
页码:2461 / 2480
页数:19
相关论文
共 75 条
[1]  
Sze V., Chen Y.H., Einer J., Et al., Hardware for machine learning: Challenges and opportunities, Proceedings of the Custom Integrated Circuits Conference, pp. 1-8, (2017)
[2]  
Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks, Proceedings of the Advances in Neural Information Processing Systems, pp. 1097-1105, (2012)
[3]  
Farabet C., Couprie C., Najman L., Et al., Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 8, pp. 1915-1929, (2013)
[4]  
Szegedy C., Liu W., Jia Y., Et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, (2015)
[5]  
Tompson J., Jain A., Lecun Y., Et al., Joint training of a convolutional network and a graphical model for human pose estimation, Proceedings of the Neural Information Processing Systems, pp. 1799-1807, (2014)
[6]  
Mikolov T., Deoras A., Povey D., Et al., Strategies for training large scale neural network language models, Proceedings of the Automatic Speech Recognition and Understanding (ASRU), pp. 196-201, (2011)
[7]  
Hinton G., Deng L., Yu D., Et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 6, pp. 82-97, (2012)
[8]  
Sainath T.N., Mohamed A.-R., Kingsbury B., Et al., Deep convolutional neural networks for LVCSR, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8614-8618, (2013)
[9]  
Jean S., Cho K., Memisevic R., Et al., On using very large target vocabulary for neural machine translation, Proceedings of the Meeting of the Association for Computational Linguistics, pp. 1-10, (2014)
[10]  
Sutskever I., Vinyals O., Le Q., Sequence to sequence learning with neural networks, Proceedings of the Neural Information Processing Systems, pp. 3104-3112, (2014)