Extracting road maps from high-resolution remotely sensed imagery has many practical applications: improving connectivity in remote areas, monitoring urban expansion, and providing aid to disaster-prone regions. Despite the abundance of satellite and aerial imagery, substantial obstacles such as opacity, shadows, inter-class similarity, and missing roadways persist, necessitating immediate attention from researchers. This paper proposes a memory-efficient end-to-end convolution neural network-based architecture called ConnectNet that exploits the powerful features of the Res2Net model. Multi-scale features within every residual block are captured by the hierarchical design, enhancing the proposed model's representation ability. A new block, Stacked Feature Fusion, is proposed having dilated convolution layers of different rates stacked with the squeeze and excitation blocks. Both long-range and narrow-range dependencies are captured by this block, minimizing the boundary and edge loss issue. A new loss function, collective loss, is introduced that combines dice coefficient, binary cross-entropy, and Lovasz sigmoid loss functions which further improves the convergence time and resolves the class imbalance issue. Extensive experiments have been conducted to demonstrate and compare the proposed model's results with other road extraction methods on two publicly available Massachusetts Road Dataset and SpaceNet 3 Road Network Detection Dataset. The quantitative and visual results show that the proposed model outperforms state-of-the-art methods by a significantly large margin.