We present a low bit-rate video compression system that integrates region-based coding with a spatia-temporal wavelet transform. The proposed system is designed for monitoring and video-phone applications. It distinguishes between moving foreground and static background, but image segmentation might also be based on other sources. The regions are encoded in separate layers using a chroma-keying technique that allows a controlled lossy recovery of the boundaries. A 3D wavelet transform is applied to a group of frames of the predictor residual signal. Statistical dependencies of transform coefficients extracted from different image subbands are captured by conditional probability models. Without the layered coding, the system performs superior compared to the recent H.263 standard for very low bit-rate coding. The layered coding causes a small degradation in visual quality at the same bit-rate.