Semantic segmentation of road-scene images for autonomous driving is a dense pixel-level prediction task performed in real-time. Deep learning models make extensive efforts to improve segmentation accuracy, among which network architecture design is essential. In edge devices, this becomes more challenging due to limited computing power. While very deep encoder-decoder-based networks perform fairly accurately, their slow inference speed and many parameters make them unsuitable for small devices. Decoder-less models are fast but suffer from accuracy loss. To this end, we propose a novel architecture with a shallow decoder. We propose a building block for our network, which leverages a multi-scale feature pyramid model. The block efficiently learns semantic and contextual features based on which we design our network. It benefits from uniquely placed encoder skip connections, which are responsible for retaining low-level features to preserve boundary information, often lost in deep networks. Experiments on highly competitive Cityscapes and CamVid datasets show the efficiency of our proposed architecture. Our model gets a mean intersection-over-union score of 72.5% and 67.5% on the Cityscapes and CamVid test set, with only 0.6 Million parameters running in real-time.