Novel systolic and super-systolic architectures are presented for polynomial basis multiplication over GF(2(m)) based on irreducible trinomials. By suitable cut-set retiming, we have derived here an efficient bit-level-pipelined bit-parallel systolic design for binary field multiplication which requires fewer gates and registers and involves nearly half the time-complexity of the corresponding existing design. We have also suggested a digit-level-pipelined design, which involves lower latency, and fewer registers compared with the bit-level-pipelined structure. Moreover, we have proposed a super-systolic design consisting of a set of systolic arrays in a systolic-pipeline and a pipelined systolic-block design consisting of a pipelined blocks of concurrent systolic arrays. The super-systolic designs have the same average computation time and the same critical path as the proposed bit-level-pipelined design, but can be used to reduce the latency by a factor O(root m) at the cost of marginally higher number of XOR gates and bit-registers. The hardware complexities of proposed super-systolic designs are nearly three times that of the existing bit-parallel structures, but offer very high throughput compared with the others for large values of m. For the field orders m = 233 and m = 409, the proposed structures offer, respectively, ten and eleven times more throughput than the others.