In this paper, tao new system architectures, overlap-state sequential and split-and-merge parallel, are proposed based on a novel boundary postprocessing technique for the computation of the discrete wavelet transform (DWT). The basic idea is to introduce multilevel partial computations for samples near data boundaries based on a finite state machine model of the DWT derived from the lifting scheme. The key observation is that these partially computed (lifted) results can also be stored back to their original locations and the transform can be continued anytime later as long as these partial computed results are preserved. It is shown that such an extension of the in-place calculation feature of the original lifting algorithm greatly helps to reduce the extra buffer and communication overheads, in sequential and parallel system implementations, respectively. Performance analysis and experimental results show that, for the Daubechies (9,7) wavelet filters, using the proposed boundary postprocessing technique, the minimal required buffer size in the line-based sequential DWT algorithm [1] is 40% less than the best available approach. In the parallel DWT algorithm me show 30% faster performance than existing approaches.