Yawning is an important indicator of drivers' drowsiness or fatigue. Techniques for automatic detection of driver's yawning have been developed for use as a component of driver fatigue monitoring system. However, detecting driver's yawning event accurately in real-time is still a challenging task, in particular in applications such as driver fatigue detection, illumination conditions vary in a broad range, driver facial features vary in size, shape, texture and degrees of distortion. In this paper, we present a deep neural network model built using transfer learning and sequential learning from yawning video clips as well as augmented images for yawning detection. As a result, unlike many other methods that follow a sequence of processes such as face ROI detection, eye/nose/mouth positioning and mouth open/dose determination, the proposed yawning detection system detect yawning events directly from video images without requiring any facial part positions. The system is robust to variations in object scales, positions and subject view angles. The system has been evaluated on publicly available yawning data sets, YawDD and NTHU-DDD, as well as a data set containing challenging yawning videos. The experimental results show that the proposed yawning detection system has the capability of detecting yawning events in high precision even when face turns away from camera up to 70 degrees, while exhibiting capability of being scale- and spatial-invariant. In addition, the model demonstrates the capability of discriminating yawning events very well from the actions involving mouth opening-closing motions such as talking and laughing.