Visual Simultaneous Localization and Mapping (VSLAM) is a crucial technology for autonomous mobile vision robots. However, existing methods often suffer from low localization accuracy and poor robustness in scenarios with significant scale variations and low-texture environments, primarily due to insufficient feature extraction and reduced matching precision. To address these challenges, this paper proposes an improved multi-scale local feature matching algorithm based on LoFTR, named MSpGLoFTR. First, we introduce a Multi-Scale Local Attention Module (MSLAM), which achieves feature fusion and resolution alignment through multi-scale window partitioning and a shared multi-layer perceptron (MLP). Second, a Multi-Scale Parallel Attention Module is designed to capture features across various scales, enhancing the model's adaptability to large-scale features and highly similar pixel regions. Finally, a Gated Convolutional Network (GCN) mechanism is incorporated to dynamically adjust weights, emphasizing key features while suppressing background noise, thereby further improving matching precision and robustness. Experimental results demonstrate that MSpGLoFTR outperforms LoFTR in terms of matching precision, relative pose estimation performance, and adaptability to complex scenarios. Notably, it excels in environments with significant illumination changes, scale variations, and viewpoint shifts. This makes MSpGLoFTR an efficient and robust feature matching solution for complex vision tasks.