共 32 条
[1]
Dongarra JJ(1994)Scalability issues affecting the design of a dense linear algebra library J. Parallel Distrib. Comput. 22 523-537
[2]
van de Geijn RA(2009)Parallel LDPC decoding on GPUs using a stream-based computing approach Journal of Computer Science and Technology 24 913-924
[3]
Walker DW(2003)The linpack benchmark: Past, present and future Concurrency and Computation: Practice and Experience 15 803-820
[4]
Falcao G(1990)A set of level 3 basic linear algebra subprograms ACM Trans. Math. Softw. 16 1-17
[5]
Yamagiwa S(2008)Original 45 nm Intels Core2 processor performance Intel Technology Journal 11 157-168
[6]
Silva V(1995)A three-dimensional approach to parallel matrix multiplication IBM Journal of Research and Development 39 575-582
[7]
Sousa L(2008)Merge: A programming model for heterogeneous multi-core systems SIGOPS Oper. Syst. Rev. 42 287-296
[8]
Dongarra JJ(2007)Introduction to the cell broadband engine architecture IBM J. Res. Dev. 51 503-519
[9]
Luszczek P(undefined)undefined undefined undefined undefined-undefined
[10]
Petitet A(undefined)undefined undefined undefined undefined-undefined