Matrix multiplication is a fundamental operation of numerical linear algebra, and applied widely in high performance computing to solve scientific and engineering problems. It requires computer systems have huge computing capacity and data throughputs as problem size is increased, and consumes much more power. In this research, an OpenCL-based matrix multiplier is presented to improve energy efficiency. When data are single precision floating-point, and matrix dimension is 16384x16384, the matrix multiplier implemented by the FPGA board DE5a-NET achieves 240.34 GFLOPs in data throughput and 19.64 GFLOPs/W in energy efficiency, which are 296 times and 1964 times over the software simulation carried out on a PC with 32 GB DDR4 RAMs and an AMD processor Ryzen 7 1700 running at 3.0 GHz, respectively.