Recently, RGB-D sensors have gained significant popularity due to their affordable cost. Compared to their associated high-resolution (HR) color images, their depth maps counterparts are typically with lower resolution. In addition, the quality of those maps is still inadequate for further applications due to the existing holes, noises and artifacts. In this paper, we propose a clustering graph-based framework for depth map super-resolution. This framework uses the guidance of HR textured-intensity layer to support and compel high-frequency details in the depth map recovery process. This textured layer is extracted from the consolidated HR intensity image in a texture-structure separation process via a new relative total variation technique. Furthermore, instead of the standard sparse representation that does not consider the local structural information effectively, we propose a novel clustered-graph sparse representation with a low-rank prior. With this joint representation, any signal can be coded effectively, as the low-rank property reveals the global structure information while the intrinsic information is kept by a novel multiclass incoherence self-learning between classes. At the same time, a grouped coherence within each class dictionary is preserved. We optimize that joint objective function using state-of-the-art split Bregman algorithm. Experimental results on Middleburry 2005, 2007, 2014 and real-world datasets demonstrate that the proposed algorithm is very efficient and outperforms the state-of-the-art approaches in terms of objective and subjective quality.