Design Space Exploration of the Turbo Decoding Algorithm on GPUs

被引：12

作者：

Lee, Dongwon ^{[1
]}

Wolf, Marilyn ^{[1
]}

Kim, Hyesoon ^{[2
]}

机构：

[1] Georgia Inst Technol, Sch ECE, Atlanta, GA 30332 USA

[2] Georgia Inst Technol, Sch Comp Sci, Atlanta, GA 30332 USA

来源：

PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES '10) | 2010年

关键词：

GPU; Turbo decoding; Design space exploration; CODES; IMPLEMENTATION;

D O I：

10.1145/1878921.1878953

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we explore the design space of the Turbo decoding algorithm on GPUs and find a performance bottleneck. We consider three axes for the design space exploration: a radix degree, a parallelization method, and the number of sub-frames per thread block. In Turbo decoding, a degree of radix affects computational complexity and memory access patterns in both algorithmic and implementation viewpoints. Second, computations of branch metrics (BMs) and state metrics (SMs) have a different degree of parallelism, which affects the mapping method of computational tasks to GPU threads. Finally, we can easily adjust the number of sub-frames per thread block to balance the occupancy and memory access traffic. Experimental results show that the radix-4 algorithm with the SM-centric mapping method shows the best performance at four sub-frames per thread block. According to our analysis, two factors - the occupancy and shared memory bank conflicts - differentiate the performance of different cases in the design space. We show further performance improvements by optimizing a kernel operation (max*) and applying the MAX-Log-Maximum A Posteriori (MAP) algorithm. A performance bottleneck at the finally optimized case is global memory access latency. Since the most optimized performance is comparable to that of the other programmable platforms, the GPU can be considered as another type of coprocessor for Turbo decoding implementations in mobile devices.

引用

页码：217 / 226

页数：10

共 17 条

[1]

3rd Generation Partnership Project (3GPP), 1999, 3 GENERATION PARTNER

[2] OPTIMAL DECODING OF LINEAR CODES FOR MINIMIZING SYMBOL ERROR RATE [J].