Objective: In recent years, the Brain-Computer Interface (BCI) technology has witnessed rapid advancements. Motor Imagery (MI), as one of the BCI paradigms, has found extensive applications in domains such as rehabilitation, entertainment, and neuroscience. How to conduct effective classification of it has emerged as one of the primary research issues. Electroencephalography (EEG) serves as an essential tool for studying the classification of MI. However, the existing models are incapable of fully extracting effective motion information from the interfered electroencephalogram data, leading to the final classification effect falling short of the expected goals. In response to this problem, we propose a deep temporal network based on multi-branch feature fusion and attention mechanism. This network incorporates a combination of multi-branch feature fusion, feature expansion, attention, and temporal decoding modules. Methods: First, primary features of EEG signals are extracted using a multi-branch convolutional neural network, followed by feature fusion. Subsequently, feature augmentation and attention mechanisms are employed to reduce noise interference while highlighting essential MI intentions. Finally, a temporal decoding module is utilized to deeply explore temporal information in MI data and perform classification. Results: The model performance was tested on the BCI_IV_2a, BCI_IV_2b, and OPenBMI datasets using both subject-specific and subject-independent experimental methods. The model achieved significant performance improvements on all three datasets, achieving accuracy of 81.21%, 93.12%, and 75.9%, respectively, better than other baseline models. Conclusion: Experimental results indicate that the proposed model leverages deep learning techniques for the classification of different types of MI, providing a reference framework for the development of more efficient MI-BCI systems.