共 18 条
- [1] Press W H, Teukolsky S A., Biconjugate gradient method for sparse linear systems, Computers in Physics, 6, 4, pp. 400-410, (1992)
- [2] Li Kenli, Yang Wangdong, Li Keqin, Performance analysis and optimization for SpMV on GPU using probabilistic modeling, IEEE Transactions on Parallel and Distributed Systems, 26, 1, pp. 196-205, (2015)
- [3] Li Shigang, Hu Changjun, Zhang Junchao, Et al., Automatic tuning of sparse matrix-vector multiplication on multicore clusters, Science China Information Sciences, 58, 9, pp. 1-14, (2015)
- [4] Zitova B, Flusser J., Image registration methods: A survey, Image and Vision Computing, 21, 11, pp. 977-1000, (2003)
- [5] Qadeer W, Hameed R, Shacham O, Et al., Convolution engine: Balancing efficiency and flexibility in specialized computing, Communications of the ACM, 41, 3, pp. 24-35, (2013)
- [6] Uhl A., Wavelet packet best basis selection on moderate parallel MIMD architectures, Parallel Computing, 22, 1, pp. 149-158, (1996)
- [7] Chakrabarti C, Vishwanath M., Efficient realizations of the discrete and continuous wavelet transforms: From single chip implementations to mappings on SIMD array computers, IEEE Transactions on Signal Processing, 43, 3, pp. 759-771, (1995)
- [8] Konstantinidis E, Cotronis Y., A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling, Journal of Parallel and Distributed Computing, 107, 1, pp. 37-56, (2017)
- [9] Ilic A, Pratas F, Sousa L., Cache-aware roofline model: Upgrading the loft, IEEE Computer Architecture Letters, 13, 1, pp. 21-24, (2013)
- [10] Stengel H, Treibig J, Hager G, Et al., Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model, Proc of the 29th ACM on Int Conf on Supercomputing, pp. 207-216, (2015)