The advancement of radar has enabled more accurate object detection and semantic segmentation by leveraging the measurements of the distance, direction, and velocity of an object, showing great potential for radar systems to be adopted in various scenarios, such as human detection for smart vacuum cleaners or semantic segmentation for self-driving cars. However, the lack of available large-scale annotated radar datasets and the significant human effort needed to annotate radar points are making it difficult to adapt radar-based sensing applications. Although it is difficult to annotate radar point clouds, it is easier to collect synchronous radar and camera data, and there are already pre-trained models for camera-based semantic segmentation. Inspired by this, we propose RadarContrast, a self-supervised camera-to-radar knowledge distillation approach to reduce the annotation burden on raw radar data by leveraging existing vision-based pre-trained models. RadarContrast works by pre-training the radar-based model with existing camera-based models using a large amount of non-annotated data, and later only requires using a small portion of annotated data to fine-tune the radar model. To be more specific, we build the distillation based on regions that most likely belong to the same object. We apply image segmentation algorithms to separate the image into objects, and instead of doing pixel-wise or point-wise contrasting, we group the pixels and radar point clouds into superpixels and superpoints, respectively. Then, we use a biased pooling strategy to transfer the knowledge from 2D cameras to 3D radar point clouds. We evaluate RadarContrast using the nuScenes dataset for autonomous driving and demonstrate that our method can achieve similar performance for semantic segmentation while using 5x-10x less annotated data.