Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]

被引:1
作者
Zhang, Jie-Fang [1 ]
Zhang, Zhengya [1 ]
机构
[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
关键词
Surveys; Scalability; Multichip modules; Artificial neural networks; Machine learning; Bandwidth; Tutorials; Design engineering; Hardware design languages; ML hardware; DNN accelerator; sparse DNN architecture; DNN chiplet; heterogeneous integration; DEEP NEURAL-NETWORKS; ACCELERATION; SPARSE;
D O I
10.1109/MCAS.2023.3302390
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The widespread use of deep neural networks (DNNs) and DNN-based machine learning (ML) methods justifies DNN computation as a workload class itself. Beginning with a brief review of DNN workloads and computation, we provide an overview of single instruction multiple data (SIMD) and systolic array architectures. These two basic architectures support the kernel operations for DNN computation, and they form the core of many flexible DNN accelerators. To enable a higher performance and efficiency, sparse DNN hardware can be designed to gain from data sparsity. We present common approaches from compressed storage to processing sparse data to reduce memory and bandwidth usage and improve energy efficiency and performance. To accommodate the fast evolution of new models of larger size and higher complexity, modular chiplet integration can be a promising path to meet the growing needs. We show recent work on homogeneous tiling and heterogeneous integration to scale up and scale out hardware to support larger models of more complex functions.
引用
收藏
页码:35 / 53
页数:19
相关论文
共 58 条
  • [1] Bit-Pragmatic Deep Neural Network Computing
    Albericio, Jorge
    Delmas, Alberto
    Judd, Patrick
    Sharify, Sayeh
    O'Leary, Gerard
    Genov, Roman
    Moshovos, Andreas
    [J]. 50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 382 - 394
  • [2] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
    Albericio, Jorge
    Judd, Patrick
    Hetherington, Tayler
    Aamodt, Tor
    Jerger, Natalie Enright
    Moshovos, Andreas
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
  • [3] Structured Pruning of Deep Convolutional Neural Networks
    Anwar, Sajid
    Hwang, Kyuyeon
    Sung, Wonyong
    [J]. ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2017, 13 (03)
  • [4] Benchmark Analysis of Representative Deep Neural Network Architectures
    Bianco, Simone
    Cadene, Remi
    Celona, Luigi
    Napoletano, Paolo
    [J]. IEEE ACCESS, 2018, 6 : 64270 - 64277
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] Canziani A, 2017, arXiv, DOI [10.48550/arXiv.1605.07678, DOI 10.48550/ARXIV.1605.07678]
  • [7] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Chen, Yu-Hsin
    Yange, Tien-Ju
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 292 - 308
  • [8] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [9] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [10] Cho S.-G., 2021, PROC S VLSI CIRCUITS, P1