AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

被引：35

作者：

Zheng, Size ^{[1
]}

Chen, Renze ^{[1
]}

Wei, Anjiang ^{[2
]}

Jin, Yicheng ^{[2
]}

Han, Qin ^{[2
]}

Lu, Liqiang ^{[2
]}

Wu, Bingyang ^{[2
]}

Li, Xiuhong ^{[3
,4
]}

Yan, Shengen ^{[3
]}

Liang, Yun ^{[1
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Stanford Univ, Stanford, CA USA

[3] Sensetime Res, Beijing, Peoples R China

[4] Shanghai Lab, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22) | 2022年

基金：

中国国家自然科学基金;

关键词：

spatial accelerators; code generation; mapping; tensor computations;

D O I：

10.1145/3470496.3527440

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hardware specialization is a promising trend to sustain performance growth. Spatial hardware accelerators that employ specialized and hierarchical computation and memory resources have recently shown high performance gains for tensor applications such as deep learning, scientific computing, and data mining. To harness the power of these hardware accelerators, programmers have to use specialized instructions with certain hardware constraints. However, these hardware accelerators and instructions are quite new and there is a lack of understanding of the hardware abstraction, performance optimization space, and automatic methodologies to explore the space. Existing compilers use hand-tuned computation implementations and optimization templates, resulting in sub-optimal performance and heavy development costs. In this paper, we propose AMOS, which is an automatic compilation framework for spatial hardware accelerators. Central to this framework is the hardware abstraction that not only clearly specifies the behavior of spatial hardware instructions, but also formally defines the mapping problem from software to hardware. Based on the abstraction, we develop algorithms and performance models to explore various mappings automatically. Finally, we build a compilation framework that uses the hardware abstraction as compiler intermediate representation (IR), explores both compute mappings and memory mappings, and generates high-performance code for different hardware backends. Our experiments show that AMOS achieves more than 2.50x speedup to hand-optimized libraries on Tensor Core, 1.37x speedup to TVM on vector units of Intel CPU for AVX-512, and up to 25.04x speedup to AutoTVM on dot units of Mali GPU. The source code of AMOS is publicly available.

引用

页码：874 / 887

页数：14

共 69 条

[31] Ascend: a Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing
Liao, Heng
Tu, Jiajin
Xia, Jing
Liu, Hu
Zhou, Xiping
Yuan, Honghui
Hu, Yuxing
[J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 789 - 801
[32] TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
Lu, Liqiang
Guan, Naiqing
Wang, Yuyue
Jia, Liancheng
Luo, Zizhang
Yin, Jieming
Cong, Jason
Liang, Yun
[J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 720 - 733
[33] Ma LX, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P881
[34] Ma NN, 2020, Arxiv, DOI arXiv:2007.11823
[35] CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Manavski, Svetlin A.
Valle, Giorgio
[J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 2)
[36] Markidis S, 2018, Arxiv, DOI arXiv:1803.04014
[37] Automatically Scheduling Halide Image Processing Pipelines
Mullapudi, Ravi Teja
Adams, Andrew
Sharlet, Dillon
Ragan-Kelley, Jonathan
Fatahalian, Kayvon
[J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):
[38] A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories
Nowatzki, Tony
Sartin-Tarm, Michael
De Carli, Lorenzo
Sankaralingam, Karthikeyan
Estan, Cristian
Robatmili, Behnam
[J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2015, 37 (01):
[39] Nvidia, 2022, CUTLASS
[40] Nvidia, 2022, VOLT ARCH WHIT PAP

← 1 2 3 4 5 6 7 →