Xuantie-910: A Commercial Multi-Core 12-Stage Pipeline Out-of-Order 64-bit High Performance RISC-V Processor with Vector Extension

被引:65
作者
Chen, Chen [1 ]
Xiang, Xiaoyan [1 ]
Liu, Chang [1 ]
Shang, Yunhai [1 ]
Guo, Ren [1 ]
Liu, Dongqi [1 ]
Lu, Yimin [1 ]
Hao, Ziyi [1 ]
Luo, Jiahui [1 ]
Chen, Zhijian [1 ]
Li, Chunqiang [1 ]
Pu, Yu [1 ]
Meng, Jianyi [1 ]
Yan, Xiaolang [1 ]
Xie, Yuan [1 ]
Qi, Xiaoning [1 ]
机构
[1] Alibaba Cloud, T Head Div, Hangzhou, Peoples R China
来源
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) | 2020年
关键词
RISC-V; multi-core; cache; memory architectures; out of order; vector; extension;
D O I
10.1109/ISCA45697.2020.00016
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations. It also implements the 0.7.1 stable release of RISCV vector extension specification for high efficiency vector processing. Xuantie-910 supports multi-core multi-cluster SMP with cache coherence. Each cluster contains 1 to 4 core(s) capable of booting the Linux operating system. Each single core utilizes the state-of-the-art 12-stage deep pipeline, out-of-order, multi-issue superscalar architecture, achieving a maximum clock frequency of 2.5 GHz in the typical process, voltage and temperature condition in a TSMC 12nm FinFET process technology. Each single core with the vector execution unit costs an area of 0.8 mm(2) (excluding the L2 cache). The toolchain is enhanced significantly to support the vector extension and custom extensions. Through hardware and toolchain co-optimization, to date Xuantie-910 delivers the highest performance (in terms of IPC, speed, and power efficiency) for a number of industrial control flow and data computing benchmarks, when compared with its predecessors in the RISC-V family. Xuantie-910 FPGA implementation has been deployed in the data centers of Alibaba Cloud, for application-specific acceleration (e.g., blockchain transaction). The ASIC deployment at low-cost SoC applications, such as IoT endpoints and edge computing, is planned to facilitate Alibaba's end-to-end and cloud-to-edge computing infrastructure.
引用
收藏
页码:52 / 64
页数:13
相关论文
共 25 条
[1]  
[Anonymous], 2019, RISC V HIGHPERFORMAN
[2]  
[Anonymous], 2016, IEEE T VERY LARGE SC
[3]  
[Anonymous], 2016, RETROSPECTIVE MIPS M
[4]  
[Anonymous], 2015, 19 INT S VLSI DES TE, DOI DOI 10.1109/ISVDAT.2015.7208116
[5]  
[Anonymous], 2015, UCBMECS2015167
[6]  
Asanovic K., 2016, technical report ucb/eecs-2016-17
[7]   THE ORACLE SPARC T5 16-CORE PROCESSOR SCALES TO EIGHT SOCKETS [J].
Feehrer, John ;
Jairath, Sumti ;
Loewenstein, Paul ;
Sivaramakrishnan, Ram ;
Smentek, David ;
Turullols, Sebastian ;
Vahidsafa, Ali .
IEEE MICRO, 2013, 33 (02) :48-57
[8]  
Flamand E, 2018, IEEE INT CONF ASAP, P69
[9]   Tutorial: SHAKTI Processors: An Open-Source Hardware Initiative [J].
Gala, Neel ;
Menon, Arjun ;
Bodduna, Rahul ;
Madhusudan, G. S. ;
Kamakoti, V. .
2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, :7-8
[10]  
Gautschi M, 2014, IEEE INT CONF ASAP, P25, DOI 10.1109/ASAP.2014.6868626