Merging VLIW and vector processing techniques for a simple, high-performance processor architecture

被引：0

作者：

Soliman, Mostafa I. ^{[1
,2
]}

机构：

[1] Taibah Univ, Comp Sci & Informat Dept, Community Coll, Al Adinah Al Unawwarah 2898, Saudi Arabia

[2] Aswan Univ, Dept Elect Engn, Comp & Syst Sect, Fac Engn, Aswan 81542, Egypt

来源：

MICROELECTRONICS JOURNAL | 2015年 / 46卷 / 07期

关键词：

Data-parallel applications; VLIW; Vector processing; VHDL; Performance evaluation; SUPERSCALAR; LEVEL; CORE;

D O I：

10.1016/j.mejo.2015.03.012

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a new processor architecture called VVSHP for accelerating data-parallel applications, which are growing in importance and demanding increased performance from hardware. VVSHP merges VLIW and vector processing techniques for a simple, high-performance processor architecture. One key point of VVSHP is the execution of multiple scalar instructions within VLIW and vector instructions on unified parallel execution datapaths. Another key point is to reduce the complexity of VVSHP by designing a two-part register file: (1) shared scalar-vector part with eight-read/four-write ports 64 x 32-bit registers (64 scalar or 16 x 4 vector registers) for storing scalar/vector data and (2) vector part with two-read/one-write ports 48 vector-registers, each stores 4 x 32-bit vector data. Moreover, processing vector data with lengths varying from 1 to 256 represents a key point for reducing the loop overheads. VVSHP can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results to be written back into VVSHP register file. However, it cannot issue more than one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data memory. The design of our proposed VVSHP processor is implemented using VHDL targeting the Xilinx FPGA Virtex-5 and its performance is evaluated. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：637 / 655

页数：19

共 50 条

[31] VLSI implementation of the high performance data path design in VLIW processor
Yang, Yan
Hou, Chao-Huan
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2003, 31 (11): : 1667 - 1670
[32] ARCHITECTURE OF A VSLI VECTOR QUANTIZATION PROCESSOR FOR SPEECH PROCESSING SYSTEMS
PREISS, E
PFLEIDERER, HJ
NTZ ARCHIV, 1988, 10 (09): : 227 - 236
[33] Designing area and performance constrained SIMD/VLIW image processing architecture
Fatemi, H
Corporaal, H
Basten, T
Kleihorst, R
Jonker, P
ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2005, 3708 : 689 - 696
[34] A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing
Shang, Li-Hong
Zhou, Mi
Zhang, Jiong
Li, Hong-Bin
PROCEEDINGS OF THE 2009 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS, 2009, : 138 - 143
[35] Cluster assignment for high-performance embedded VLIW processors
Lapinskii, VS
Jacome, MF
De Veciana, GA
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2002, 7 (03) : 430 - 454
[36] A HIGH-PERFORMANCE FPGA-BASED FUZZY PROCESSOR ARCHITECTURE FOR MEDICAL DIAGNOSIS
Chowdhury, Shubhajit Roy
Saha, Hiranmay
IEEE MICRO, 2008, 28 (05) : 38 - 52
[37] Evolution of a high-performance PC architecture data processing system
Turri, M
DASIA 99: DATA SYSTEMS IN AEROSPACE, 1999, 447 : 73 - 78
[38] Design and program multi-processor platform for high-performance embedded processing
Liu, Yijun
Li, Zhenkun
Journal of Software, 2009, 4 (10) : 1069 - 1075
[39] Designing a Multi-Processor Education Board for High-Performance Embedded Processing
Liu, Yijun
Wang, Banghai
Xie, Guobo
Chen, Pinghua
Li, Zhenkun
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 2546 - 2551
[40] High-performance architecture
Sherwin-Williams
不详
Finsh. Today, 2007, 2 (22-24):

← 1 2 3 4 5 →