DataScalar: A memory-centric approach to computing
被引:0
作者:
Kaxiras, Stefanos
论文数: 0引用数: 0
h-index: 0
机构:
University of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United StatesUniversity of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United States
Kaxiras, Stefanos
[1
]
Burger, Doug
论文数: 0引用数: 0
h-index: 0
机构:
University of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United StatesUniversity of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United States
Burger, Doug
[1
]
Goodman, James R.
论文数: 0引用数: 0
h-index: 0
机构:
University of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United StatesUniversity of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United States
Goodman, James R.
[1
]
机构:
[1] University of Wisconsin-Madison, Comp. Sciences, 1210 W. Dayton St., Madison, WI 53705, United States
Computer simulation - Computer systems programming - Data storage equipment - Microprocessor chips - Parallel processing systems;
D O I:
暂无
中图分类号:
学科分类号:
摘要:
Commodity microprocessors contain more on-chip memory with each successive generation, and will contain tens of megabytes within the decade. We describe a novel architecture that runs an unmodified uniprocessor program across multiple nodes, each of which contains a processor tightly integrated with a sizable memory. The execution of instructions is replicated, while the access of operands is distributed across the nodes. Each node accesses operands in its fast local memory and broadcasts them to the other nodes. This architecture exploits out-of-order execution and the fact that each chip has integrated processor and memory, to run memory-intensive, hard-to-parallelize programs more efficiently. In this paper, we describe an implementation with specific solutions to the unique problems that this architecture poses. Finally, we conclude by comparing simulation results of our implementation to more traditional equivalent systems. In our simulated implementation, five unmodified SPEC95 binaries ran - in most cases - considerably faster than in systems with more traditional memory systems.