Common visual simultaneous localization and mapping systems are built on the static environment hypothesis and fail to handle the substantial environmental dynamics. Particularly in highly dynamic environments, the pose estimation errors tend to accumulate rapidly, even causing the system to fail. To mitigate this limitation, we have developed DI-SLAM, an enhanced real-time SLAM system for dynamic indoor environments, extending the capabilities of ORB-SLAM3. DI-SLAM introduces a new parallel object detection thread, which employs an enhanced Yolov5s to extract semantic information in every input frame, enabling the filtering of dynamic features for initial tracking and localization. Additionally, we integrate multi-view geometry to further discriminate dynamic feature information, thereby increasing the precision and robustness of localization systems. Finally, experiments were executed on the TUM RGB-D dataset to prove the performance of the proposed algorithm. The results demonstrate strong performance on most datasets, showing a 97.06% improvement in localization accuracy over the original ORB-SLAM3 algorithm in indoor dynamic environments.