The present article proposes a high-performance architecture for the Two-Step Search algorithm, which is used in half-pixel motion estimation. As motion estimation calls for intense computation on a large number of pixels stored in memory, frequent memory access is involved in this operation. In the present article, an architecture, which is based on an intelligent memory configuration to contain the required large memory bandwidth, has been proposed for implementing the Two-Step Search algorithm for variable block sizes as recommended by H.264 standard. The present architecture has been compared with a reported architecture. It has been found that the proposed architecture can process up to 33% more number of High Definition Tele-Vision frames (of size 1280x720) and also consumes 5% less power by sacrificing only about 1.6% of the total chip area.