Machine Learning Inference Framework on Multi-Core Processor

被引:0
|
作者
Zhang X. [1 ,2 ,3 ]
Zhi T. [1 ,3 ]
机构
[1] Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
[3] Cambricon Tech.Ltd., Shanghai
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2019年 / 56卷 / 09期
基金
中国国家自然科学基金;
关键词
Deep learning framework; Low-latency inference; Multi-core processor; Operation splitting; Recurrent neural network;
D O I
10.7544/issn1000-1239.2019.20180786
中图分类号
学科分类号
摘要
In recent years, deep neural network has been widely used in many domains and got huge success. Since the size and computation workload for neural network model is increasing rapidly, GPU and many new-designed domain-specific accelerators have been used in order to complete computing neural networks as soon as possible. However, the traditional general-purpose processor should not be ignored. Considering it is common and easy to get, exploring efficient way for using general-purpose processor in deep learning is meaningful. In training phase, the multi-core architecture is suitable for data parallelism which helps to increase system throughput. However, in inference phase, end-to-end latency is much more important than throughput, and traditional data parallelism could not fulfill the requirement of small batch and low latency. In order to utilize hardware resource of multi-core architecture, it is necessary to split the computation task into smaller parts which can be executed on multi-core processor in parallel. Besides, a sophisticated strategy is necessary to make sure the split plan will not affect computing efficiency on each core. In this paper, we propose a parallel framework for the multi-core general-purpose processor. It divides each operation in the neural network into smaller ones and executes them on the multiple cores in parallel. By offering some necessary assistant operations, this framework can be easily transplanted to support potential multi-core processors. Also, the framework can automatically generate an effective splitting plan for the given neural networks. The plan is designed with enough consideration of both network architecture and low-level hardware. The experimental results show that this framework can give an efficient splitting plan which substantially reduces the end-to-end latency of inference task on multi-core processor. © 2019, Science Press. All right reserved.
引用
收藏
页码:1977 / 1987
页数:10
相关论文
共 19 条
  • [1] Sun Y., Diang D., Wang X., Et al., DeepID3: Face recognition with very deep neural networks, (2015)
  • [2] Eriguchi A., Hashimoto K., Tsuruoka Y., Tree-to-sequence attentional neural machine translation, Proc of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 823-833, (2016)
  • [3] Ren S., He K., Sun J., Et al., Faster R-CNN: Towards real-time object detection with region proposal networks, Proc of the 29th Annual Conf on Neural Information Processing Systems, pp. 91-99, (2015)
  • [4] Krizhevsky A., Sutskever I., Hinton G., Et al., ImageNet classification with deep convolutional neural networks, Proc of the 26th Annual Conf on Neural Information Processing Systems, pp. 1106-1114, (2012)
  • [5] Karpathy A., Li F., Deep visual-semantic alignments for generating image descriptions, Proc of IEEE Conf on Computer Vision and Pattern Recognition, pp. 3128-3137, (2015)
  • [6] Amodei D., Anubhai R., Battenberg E., Et al., Deep speech 2: End-to-end speech recognition in English and mandarin, (2015)
  • [7] Chen T., Du Z., Sun N., Et al., DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning, Proc of the 19th ACM Int Conf on Architectural Support for Languages and Operating Systems, pp. 269-284, (2014)
  • [8] Chen Y., Luo T., Liu S., Et al., DaDianNao: A machine-learning supercomputer, Proc of the 47th Annual IEEE/ACM Int Symp on Microarchitecture, pp. 609-622, (2014)
  • [9] Zhang S., Du Z., Zhang L., Et al., Cambricon-X: An accelerator for sparse neural networks, Proc of the 49th Annual IEEE/ACM Int Symp on Microarchitecture, pp. 1-12, (2016)
  • [10] Du Z., Fasthuber R., Chen T., Et al., ShiDianNao: Shifting vision processing closer to the sensor, Proc of ACM SIGARCH Computer Architecture News, pp. 92-104, (2015)