In the maintenance and management of rotating machinery, vibration signals during operation can reflect the health status of the system. Deep learning algorithms have enabled automatic feature extraction in vibration monitoring and diagnostic technologies, gaining widespread recognition in intelligent equipment management, though some limitations still exist. To improve model performance under limited sample scenarios and incorporate continuously optimizable strategies, this study introduces LiteDPER-CTQN (Lightweight Double Prioritized Experience Replay with Convolutional Transformer Q-Network), a novel fault diagnosis agent incorporating reinforcement learning. The agent demonstrates superior feature extraction and model adaptation through three key innovations: a lightweight reinforcement learning framework ensuring efficient and stable training, an enhanced Transformer-based architecture enabling multi-scale feature fusion, and an integrated intelligent diagnosis system. Experimental results on both bench tests and electric locomotive data demonstrate that our method achieves higher diagnosis accuracy, faster convergence, and lower computational resource consumption compared to state-of-the-art approaches, while the visualization of the Q-value function enhances the interpretability of the decision-making process.