ParaC: A Domain Programming Framework of Image Processing on GPU Accelerators

被引:0
作者
Lu X.-J. [1 ,2 ]
Liu L. [1 ]
Jia H.-P. [1 ]
Feng X.-B. [1 ]
Wu C.-G. [1 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, The Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
来源
Lu, Xing-Jing (xingjinglu@gmail.com) | 1655年 / Chinese Academy of Sciences卷 / 28期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Compiler optimization; source-to-source translation; Domain specific language; GPGPU accelerator; Image processing;
D O I
10.13328/j.cnki.jos.005241
中图分类号
学科分类号
摘要
Image processing algorithms take the GPU accelerators as the main speedup solution. However, the performance difference between a naïve implementation and a highly optimized one on the same GPU accelerators is frequently an order of magnitude or more. The GPGPU platform features complicated hardware architecture characteristics, such as the large amount of multi-dimension and multi -level threads and the deep hierarchy memory system, while the different part of the latter features different capacity, bandwidth, latency and access authority. Additionally, image processing algorithms have complex operations, border data accessing rules and memory accessing patterns. Therefore, parallel execution model of tasks, organization of threads and parallel tasks to device mapping not only have big impact on the scalability, scheduling, communication and synchronization, but also affect the efficiency of memory accessing. In a word, the algorithm optimization methods on GPGPU platforms are difficult, complicated and less efficient. This paper proposes a domain specific language, ParaC, which can provide high level program semantics through the new language extensions. It obtains the applications' software characteristics, such as the operation information, the data reuse among parallel tasks and the memory access patterns, along with hardware platform information and the domain pre-knowledge driven optimization mechanism, to generate high performance GPGPU code automatically. The source-to-source compiler is then used to output the standard OpenCL programs. Experiment results on test cases show that ParaC automatically generated optimization version has gained 3.22 speedup compared to the hand-tuned version for the best case, while the number of lines of the former is just 1.2% to 39.68% of the latter. © Copyright 2017, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1655 / 1675
页数:20
相关论文
共 18 条
[1]  
The OpenCL Specification, (2012)
[2]  
CUDA Toolkit Documentation, (2017)
[3]  
OpenMP Application Program Interface, (2013)
[4]  
The OpenACC Application Programming Interface, (2015)
[5]  
Jonathan R.K., Connelly B., Andrew A., Sylvain P., Fredo D., Saman A., Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, Proc. of the 34th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 2013), pp. 519-530, (2013)
[6]  
Jonathan R.K., Andrew A., Sylvain P., Marc L., Saman A., Fredo D., Decoupling algorithms from schedules for easy optimization of image processing pipelines, ACM Trans. on Graphics (TOG), 31, 4, pp. 1-12, (2012)
[7]  
Membarth R., Hannig F., Teich J., Korner M., Eckert W., Generating device-specific GPU code for local operators in medical imaging, Proc. of the 26th IEEE Int'l Conf. on Parallel & Distributed Processing Symposium (IPDPS), pp. 569-581, (2012)
[8]  
Membarth R., Reiche O., Hannig F., Teich J., Korner M., Eckert W., HIPA<sup>cc</sup>: A domain-specific language and compiler for image processing, IEEE Trans. on Parallel and Distributed Systems, 27, 1, pp. 210-224, (2016)
[9]  
Bankman I.N., Handbook of Medical Image Processing and Analysis, pp. 3-18, (2008)
[10]  
Russ J.C., The Image Processing Handbook, pp. 288-355, (2006)