Building footprint extraction is crucial for various applications, including disaster management, change detection, and 3D modeling. Satellite and aerial images, when combined with deep learning techniques, offer an effective means for this task. The Multi-scale Aggregation Fully Convolutional Network (MA-FCN) is an encoder-decoder model that emphasizes scale information, producing the final segmentation map by concatenating four feature maps from different stages of the decoder. To enhance segmentation accuracy, we propose two novel deep learning models: Attention MA-FCN and Residual Attention MA-FCN. Attention MA-FCN incorporates attention gates in the skip connections to emphasize relevant features, directing the model's focus to essential areas. Residual Attention MA-FCN further integrates residual blocks into the architecture, using both attention mechanisms and residual blocks to improve stability against gradient vanishing and overfitting, thereby enabling deeper training. These models were evaluated on the WHU, Massachusetts, and Jinghai District datasets, showing superior performance compared to the original MA-FCN. Specifically, Residual Attention MA-FCN outperformed MA-FCN and Attention MA-FCN by 3.6% and 0.92% on the WHU dataset, and by 5.51% and 0.91% on the Massachusetts dataset in terms of the Intersection Over Union (IOU) metric. Additionally, Residual Attention MA-FCN surpassed MA-FCN, Attention MA-FCN, Mask-RCNN, and U-Net models on the Jinghai District dataset. Due to the significance of building footprint extraction in various applications, the results of this study indicates that the proposed methods are more accurate than the MA-FCN model with better performances in IOU and F1-score metrics.