Architecture of programmable systolic array processor for discrete wavelet transform
被引:0
作者:
Miyake, Jiro
论文数: 0引用数: 0
h-index: 0
机构:
Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135
Miyake, Jiro
[1
]
Kuninobu, Shigeo
论文数: 0引用数: 0
h-index: 0
机构:
Information, Production and Systems Research Center, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135
Kuninobu, Shigeo
[2
]
Baba, Takaaki
论文数: 0引用数: 0
h-index: 0
机构:
Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135
Baba, Takaaki
[1
]
机构:
[1] Graduate School of Information, Production and Systems, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135
[2] Information, Production and Systems Research Center, Waseda University, Wakamatsu-ku, Kitakyushu-shi, 808-0135
来源:
Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers
|
2009年
/
63卷
/
12期
关键词:
Computer architecture - Adders - Data transfer - Signal reconstruction - Scalability - Systolic arrays;
D O I:
10.3169/itej.63.1853
中图分类号:
学科分类号:
摘要:
An architecture of a programmable systolic array processor is proposed for the discrete wavelet transform (DWT). This transform requires a huge amount of data to be filtered. To achieve this, many processor elements (PEs) are implemented. However, the hardware of a multiplier for multiply-accumulate operations is large, and complicated connections among PEs lower flexibility and scalability. By using the time-divided multiple-operation method, the execution unit with a simple structure of shifters and a three-input adder achieved 50% of hardware size and the same performance of that achieved with a multiplier and an adder. The unique network mechanism among PEs and the systolic array architecture provided a high level of data transfer, flexibility, and scalability. Using this architecture enables a processor with ten PEs to execute DWT for 1024×1024 image pixels in 26.3 ms.