Electroencephalogram (EEG) signal classification is of utmost importance in brain-computer interface (BCI) systems. However, the inherent complex properties of EEG signals pose a challenge in their analysis and modeling. This paper proposes a novel approach of integrating wavelet scattering transform (WST) with convolutional neural network (CNN) for classifying motor imagery (MI) via EEG signals (referred as WST-CNN), capable of extracting distinctive characteristics in signals even when the data is limited. In this architecture, the first layer is non-trainable WST features with fixed initializations in WST-CNN. Furthermore, WSTs are robust to local perturbations in data, especially in the form of translation invariance, and resilient to deformations, thereby enhancing the network's reliability. The performance of the proposed idea is evaluated on the DBCIE dataset for three different scenarios: left-arm (LA) movement, right-arm (RA) movement, and simultaneous movement of both arms (BA). The BCI Competition IV-2a dataset was also employed to validate the proposed concept across four distinct motor imagery (MI) tasks, like movements in: left-hand (LH), right-hand (RH), feet (FT), and tongue (T). The classifications' performance was evaluated in terms of accuracy (eta), sensitivity (S-e), specificity (S-p), and weighted F1-score, which reached up to 92.72%, 92.72%, 97.57%, and 92.75% for classifying LH, RH, FT, and T on the BCI Competition IV-2a dataset and 89.19%, 89.19%, 94.60%, and 89.33% for classifying LA, RA, and BA, on the DBCIE dataset, respectively.