Traditional visual simultaneous localization and mapping (SLAM) methods excel in static, texture-rich environments, but may struggle in dynamic, textureless environments. Recent advances in dynamic SLAM incorporate deep learning techniques to eliminate dynamic objects. However, these methods may suffer from unrecognizable dynamic objects, and removing dynamic feature points exacerbates feature scarcity. To address the limitations, we propose MSSD-SLAM, a multifeature semantic RGB-D inertial SLAM system that incorporates point, line, and plane features to enhance robustness and enrich static map fidelity. By embedding structural constraints on 3-D spatial features constructed from multiframe observations, it guarantees geometric consistency and accurate camera pose estimation, supporting subsequent dynamic object detection. A novel Dynamic Filter has been developed that efficiently handles both semantically recognizable and unrecognizable objects by amalgamating semantic segmentation, structural constraints, and inertial measurement unit (IMU) measurements. Short-term dynamic objects are detected by the consistency of multisource information, while long-term dynamic objects, which may remain static temporarily, are identified through covisible projection among multiple frames. Validation on the TUM dataset and real-world scenarios demonstrates that the localization accuracy of MSSD-SLAM outperforms ORB-SLAM3 and Dynamic-VINS by 76% and 64%, respectively, indicating that our algorithm exhibits superior accuracy and robustness in dynamic indoor scenes compared with state-of-the-art algorithms.