Although human pose estimation has achieved great success, the ambiguity of joint prediction has not been well resolved, especially in complex situations (crowded scenes, occlusions, and unnormal poses). We think that is caused by the noisy information introduced by combining multi-level features by simply adding features at each position. To alleviate this problem, we propose a new structure of gated multi-scale feature fusion (GMSFF). This module aims to selectively import high-level features to make up for the missing semantic information of low-resolution feature maps. Inspired by the prior knowledge that the position information of joints can refer to each other, we propose a new fine-tuning strategy for pose estimation—spatial mutual information complementary module (SMICM). It can assist the model in better adjusting the current joint’s position by capturing the information contained in other joints and only adds a little computational cost. We evaluated our proposed method on four datasets: MPII Human Pose Dataset (MPII), COCO keypoint detection Dataset (COCO), Occluded Human Dataset (OCHuman), and CrowdPose Dataset. The experimental results show that with the deepening of the occlusion and crowding level of the datasets, the improvement becomes more and more obvious. In particular, a performance improvement of 2.2 AP was obtained on the OCHuman dataset. In addition, our modules are plug-and-play.