A high speed and area-efficient merged Multiply Accumulate (MAC) Units is proposed in this work To realize the area-efficient and high speed MAC unit proposed in this work, first we examine the critical delays and hardware complexities of conventional MAC architectures to derive at a unit with low critical delay and low hardware complexity. The new architecture is based on binary trees constructed using a modified 4:2 compressor circuits. Reducing the overall area is achieved by the full utilization of the compressors instead of putting zeros in free inputs. Increasing the speed of operation is achieved by avoid using the modified compressor in the critical path. Feeding the bits of the accumulated operand into the summation tree before the final adder helps to increase the speed too. The proposed MAC unit and the previous merged MAC unit are mapped on a Field Programmable Gate Array (FPGA) chip, in order to compare between them. The simulation result shows that the proposed system for 8-bit 16-bit, and 32-bit MAC unit reduces area by 6.25%, 3.2 %, and 2.5% and increases the speed by 14%, 16%, and 19% respectively. The experimental test for the proposed 8-bit MAC is done using XESS demo board (XSA-100, Spartari-X2S100tq144).