A 16-bit parallel MAC architecture for a multimedia RISC processor
被引:4
作者:
Kuroda, I
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Kuroda, I
[1
]
Murata, E
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Murata, E
[1
]
Nadehara, K
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Nadehara, K
[1
]
Suzuki, K
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Suzuki, K
[1
]
Arai, T
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Arai, T
[1
]
Okamura, A
论文数: 0引用数: 0
h-index: 0
机构:
NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, JapanNEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
Okamura, A
[1
]
机构:
[1] NEC Corp Ltd, C&C Media Res Labs, Kawasaki, Kanagawa, Japan
来源:
1998 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS-SIPS 98: DESIGN AND IMPLEMENTATION
|
1998年
关键词:
D O I:
10.1109/SIPS.1998.715773
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
This paper presents a parallel MAC(multiply-accumulation) architecture designed for DSP applications on a 200-MHz, 1.6-GOPS multimedia RISC processor. The datapath architecture of the processor is designed to realize parallel execution of a data transfer and SIMD parallel arithmetic operations. SIMD parallel 16-bit MAC instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 16-bit accumulation. This parallel Is-bit MAC instruction on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as the convolution in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, the 2D-IDCT which satisfies the IEEE1180 can be implemented in 202 cycles.