Multi-object tracking (MOT) is a crucial task in Computer Vision with numerous real-world applications. In recent years, Transformer-based models have shown promising results in MOT. However, existing methods still face challenges in scenarios involving short-term object occlusion, camera motion, and ambiguous detection especially in low frame-rate videos. In this work, we propose a novel variant of the TrackFormer model that addresses these limitations by integrating an online motion prediction module based on the Kalman Filter that incorporates the important temporal information present in the videos. The addition of the Kalman Filter helps the model study the tracked pedestrians' motion patterns and leverage them for effective association among targets across the video frames even in the case of short-term occlusions without adding much to the computational complexity of the overall framework. The proposed model is termed MotionFormer due to its inherent ability to utilize long-term motion information. Through extensive evaluations using popular tracking datasets, namely MOT-17 and MOT-20, we demonstrate the effectiveness of our approach over existing approaches. Results show that the MotionFormer provides reasonably good tracking accuracy with a much lesser identity switching rate as compared to the other models. Further, it significantly outperforms the base TrackFormer model in terms of tracking accuracy, F1-score, as well as identity switching rate for MOT-17 private and public and MOT-20 private data.