Emotion recognition from speech is gaining popularity amongst the research community. Speech Emotion Recognition (SER) systems have applicability in variety of application scenarios like health-care systems, monitoring systems and automatic driving systems to name a few. However, interpreting the results of the SER system and providing human understandable reasoning is a topic very few have touched upon. We propose a SincNet based emotion recognition engine which makes use of the interpretable filters of the first layer to explain the reasoning behind the model inference. We use the IEMOCAP dataset and compare our results of emotion recognition with the state of the art algorithms. We also propose an explainability technique to provide understanding of the model as well as the inferences. To the best of our knowledge, the proposed scheme is novel and achieves good performance for emotion recognition using speech.