共 61 条
- [1] Han TD(2011)hiCUDA: high-level GPGPU programming IEEE Trans Parallel Distrib Syst 22 78-90
- [2] Abdelrahman TS(2015)Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems ACM Trans Archit Code Optim 11 1-26
- [3] Wang Z(2007)Parallel programmability and the Chapel language Int J High Perform Comput Appl 21 291-312
- [4] Grewe D(2008)MapReduce: simplified data processing on large clusters Commun ACM 51 107-113
- [5] O’boyle MFP(2011)Copperhead: compiling an embedded data parallel language ACM SIGPLAN Not 46 47-56
- [6] Chamberlain BL(2013)Trellis: portability across architectures with a high-level framework J Parallel Distrib Comput 73 1400-1413
- [7] Callahan D(2014)Kokkos: enabling manycore performance portability through polymorphic memory access patterns J Parallel Distrib Comput 74 3202-3216
- [8] Zima HP(2012)PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation Parallel Comput 38 157-174
- [9] Dean J(2004)Brook for GPUs: stream computing on graphics hardware ACM Trans Graph 23 777-786
- [10] Ghemawat S(2011)Sponge: portable stream programming on graphics engines ACM SIGPLAN Not 46 381-392