Generating high-quality, dense reconstruction of target scenes from monocular images is an essential basis for augmented reality and robotics. However, its apparent shortcomings (such as scale ambiguity) make it challenging to apply monocular 3D reconstruction to the real world. We propose a new monocular inertial dense SLAM method to solve traditional monocular dense SLAM's limitations by combining deep learning and multi-view geometry characteristics. We use the auto-masking static pixels strategy to shield the relatively static pixels in the adjacent image sequence and improve depth estimation accuracy. In addition, this paper combines geometric constraint, photometric constraint, and IMU constraint to propose a globally consistent pose estimation method, which improves camera positioning accuracy and optimizes dense reconstruction quality. The experimental results show that our method has achieved satisfactory results.