Larrabee: A many-core x86 architecture for visual computing

被引：294

作者：

Seiler, Larry ^{[1
]}

Carmean, Doug ^{[1
]}

Sprangle, Eric ^{[1
]}

Forsyth, Tom ^{[1
]}

Abrash, Michael

Dubey, Pradeep ^{[1
]}

Junkins, Stephen ^{[1
]}

Lake, Adam ^{[1
]}

Sugerman, Jeremy ^{[2
]}

Cavin, Robert ^{[1
]}

Espasa, Roger ^{[1
]}

Grochowski, Ed ^{[1
]}

Juan, Toni ^{[1
]}

Hanrahan, Pat ^{[2
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Stanford Univ, Stanford, CA 94305 USA

来源：

ACM TRANSACTIONS ON GRAPHICS | 2008年 / 27卷 / 03期

关键词：

graphics architecture; many-core computing; real-time graphics; software rendering; throughput computing; visual computing; parallel processing; SIMD; GPGPU;

D O I：

10.1145/1360612.1360617

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a many-core programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide Vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2(nd) level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.

引用

页数：15

共 55 条

[1]

Aila Timo., 2004, Proceedings of the Fifteenth Eurographics Conference on Rendering Techniques, P161

[2]

Akenine-Moller Tomas, 2019, REAL TIME RENDERING, P4

[3] ARCHITECTURE OF THE PENTIUM MICROPROCESSOR [J].

ALPERT, D ;

AVNON, D .

IEEE MICRO, 1993, 13 (03) :11-21

[4]

[Anonymous], 2005, P ACM SIGGRAPHEUROGR

[5]

[Anonymous], 10031 IEEE

[6]

BADER A, 2008, GAME PHYS PERFORMANC

[7]

Bavoil L, 2007, I3D 2007: ACM SIGGRAPH SYMPOSIUM ON INTERACTIVE 3D GRAPHICS AND GAMES, PROCEEDINGS, P97

[8] Cilk: An efficient multithreaded runtime system [J].

Blumofe, RD ;

Joerg, CF ;

Kuszmaul, BC ;

Leiserson, CE ;

Randall, KH ;

Zhou, YL .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 37 (01) :55-69

[9] The Direct3D 10 system [J].

Blythe, David .

ACM TRANSACTIONS ON GRAPHICS, 2006, 25 (03) :724-734

[10]

BOOKOUT D, 2007, SHADOW MAP ALIASING

← 1 2 3 4 5 6 →