The movement of clouds directly influences fluctuations in solar radiation. Therefore, cloud motion vector (CMV) estimation techniques are widely applied in sequential cloud images to predict solar radiation and study other meteorologically related fields. However, traditional block matching, optical flow, and feature point methods struggle to accurately capture the deformation, multilayered, and mixed cloud types' motion due to the lack of deep semantic understanding of cloud images. Additionally, without cloud-motion-labeled, deep learning tools such as CNNs are limited in their utility for motion assessment. Therefore, this letter proposes a method of cloud image depth feature matching to assess the CMV in time series, including image enhancement, self-supervised feature extraction, feature matching, feature fusion, and spatiotemporal filtering. Experimental results demonstrate a significant improvement in accuracy compared to traditional CMV estimation techniques, with higher robustness observed across various complex cloud scenarios.