Automatic extraction of built-up areas from very high-resolution (VHR) satellite images has received increasing attention in recent years. However, due to the complexity of spectral and spatial characteristics of built-up areas, it is still a challenging task to obtain their precise location and extent. In this study, a patch-based framework was proposed for unsupervised extraction of built-up areas from VHR imagery. First, a group of corner-constrained overlapping patches were defined to locate the candidate built-up areas. Second, for each patch, its salient textures and structural characteristics were represented as a feature vector using integrated high-frequency wavelet coefficients. Then, inspired by visual perception, a patch-level saliency model of built-up areas was constructed by incorporating Gestalt laws of proximity and similarity, which can effectively describe the spatial relationships between patches. Finally, built-up areas were extracted through thresholding and their boundaries were refined by morphological operations. The performance of the proposed method was evaluated on two VHR image datasets. The resulting average F-measure values were 0.8613 for the Google Earth dataset and 0.88 for the WorldView-2 dataset, respectively. Compared with existing models, the proposed method obtains better extraction results, which show more precise boundaries and preserve better shape integrity.