3D scene modeling for indoor environments has stirred significant interest in the last few years. The obtained photo-realistic rendering of internal structures are being used in a huge variety of civilian and military applications such as training, simulation, patrimonies conservation, localization and mapping. Whereas, building such complicated maps poses significant challenges for both computer vision and robotic communities (low lighting and textureless structures, transparent and specular surfaces, registration and fusion problems, coverage of all details, real time constraint, etc.). Recently, the Microsoft Kinect sensors, originally developed as a gaming interface, have received a great deal of attention as being able to produce high quality depth maps in real time. However, we realized that these active sensors failed completely on transparent and specular surfaces due to many technical causes. As these objects should be involved into the 3D model, we have investigated methods to inspect them without any modification of the hardware. In particular, the Structure from Motion (SFM) passive technique can be efficiently integrated to the reconstruction process to improve the detection of these surfaces. In fact, we proposed to fill the holes in the depth map provided by the Infrared (IR) kinect sensor with new values passively retrieved by the SFM technique. This helps to acquire additional huge amount of depth information in a relative short time from two consecutive RGB frames. To conserve the real time aspect of our approach we propose to select key-RGB-images instead of using all the available frames. The experiments show a strong improvement in the indoor reconstruction as well as transparent object inspection.