In order to protect "data-at-rest" in storage area networks from the risk of differential power analysis attacks without degrading performance, a high-throughput masked advanced encryption standard (AES) engine is proposed. However, this engine usually adopts the unrolling technique which requires extremely large field programmable gate array (FPGA) resources. In this brief, we aim to optimize the area for a masked AES with an unrolled structure. We achieve this by mapping its operations from GF(2(8)) to GF(2(4)) as much as possible. We reduce the number of mapping [GF(2(8)) to GF(2(4))] and inverse mapping [GF(2(4)) to GF(2(8))] operations of the masked SubBytes step from ten to one. In order to be compatible, the masked Mix-Columns, masked AddRoundKey, and masked ShiftRows including the redundant masking values are carried over GF(2(4)). We also use FPGA block RAM (BRAM) to further reduce hardware resources. Compared with a state-of-the-art design, our implementation reduces the overall area by 36.2% (20.5% is contributed by the main method, and 15.7% is contributed by the BRAM optimization). It achieves 40.9-Gbits/s at 4.5-Mbits/s/slice on the Xilinx XC6VLX240T platform. We have attacked the iterative version of this masked AES in hardware. Results show that none of the bytes can be guessed from the masked AES with the collected 10 000 power traces, but 14 out of 16 bytes can be guessed from the unprotected AES with the same number of traces.