Specific emitter identification (SEI) is the process of extracting features from received signals to identify individual emitters, playing a crucial role in enhancing the security of wireless systems. Conventional deep learning-based SEI approaches heavily rely on large-scale datasets, but their performance significantly degrades under few-shot conditions. Existing few-shot SEI methods also face challenges, such as insufficient feature representation learning. In this paper, we propose a novel CNN-Transformer-based framework, FCR-CT (Feature Contrastive Reconstruction with CNN-Transformer), combined with virtual adversarial training (VAT) to improve SEI performance under few-shot conditions. During the pretraining phase, self-supervised learning is employed to optimize the encoder parameters, using a cascade of CNN and Transformer to construct an encoder-decoder structure that reconstructs unlabeled signals. By introducing feature contrastive loss, the model enhances intra-class compactness and interclass separability in the feature space, improving its representation learning capabilities. In the semi-supervised phase, the decoder is replaced with a classifier, and VAT is applied to refine the feature boundaries, further boosting classification accuracy in few-shot scenarios. Experimental results on the open-source ADS-B dataset demonstrate that the proposed FCR-CT(VAT) method achieves a 90.52% average recognition rate across 10 categories, a 1.92% improvement over the model without VAT. For 30 categories with 20 samples each, the recognition rate reached 68.65%, surpassing existing methods such as CVCNN, CNN-MAT, and SA-CNN by more than 5%. These results confirm the effectiveness and robustness of our approach in addressing the challenges of few-shot SEI in practical applications. The code is publicly available at: https://github.com/egglion/FCR-CT-_VAT.