Deep learning methods are promising tools for machinery fault diagnosis in modern industry. However, the diagnostic performance of these methods will deteriorate significantly with insufficient labeled data or varying working condition. To address this problem, a self-supervised learning (SSL) method with self-attention mechanism is proposed for machinery fault diagnosis under limited labeled data and nonstationary working condition. A simple but effective pretext task is first constructed to take full advantage of unlabeled data under varying working condition. By completing the pretext task, a pretrained model is generated for effective feature representation with the constraint of signal masking and reconstruction, thus enhancing the capability of extracting discriminative features. Then, a fine-tuned model, which inherits the capability of feature extraction of pretrained one by knowledge transfer, is built by feeding into few labeled samples. Moreover, with our SSL method, transformer architecture with self-attention mechanism is used as the backbone to enhance the ability of modeling global dependency and extracting temporal information. The experimental results reveal that our method can achieve better diagnostic performance under new working condition by comparing with six advanced methods. Furthermore, a discussion on the capability of reconstruction and the classification effect is performed in this study, which demonstrates the rationality of our designed pretext task.