The collaborative modeling of convolutional neural networks (CNN) and transformer had gained some results in the field of fault diagnosis because of the advantages of transformer in extracting global information and CNN in extracting local features. However, in actual industrial production, it was frequently faced with lots of noise interference and the huge difficulties of limited hardware equipment. To solve this problem, a lightweight and robust fault diagnosis framework called PMCFormer is proposed. Firstly, a multiscale partial convolution module was designed to improve the attention of the model to multiple local receptive fields in the vibration signal, extract local feature information, and greatly reduce the amount of calculation. Secondly, a pooling agent self-attention block was developed to capture the global features of the input signal, enhance the perception of the relationship between local and global signals, and ensure the linear complexity of the computation to avoid tedious operations such as multi-dimensional exponential operations. Three experimental results showed that compared with existing mainstream transformer and CNN fault diagnosis methods, this framework had lightweight and robustness in strong noise environments. Among them, PMCFormer achieved an average accuracy of 94.72% in the fault diagnosis of subway bogie axle box bearings under 0-10 dB noise interference.