This paper presents a parallel decoding algorithm of turbo product codes (TPC) to enable simultaneous decoding of P (greater than or equal to2) linear code vectors of a product code, therefore the decoding throughput is increased by P times without performance degradation. The implementation of an 8-parallel-processing decoder for TPC (64,57,4)(2) with FPGA has achieved the decoding throughput of 40Mbit/s with 4 iterative decoding at 72MHz clock.