At present, the older population is the fastest-growing segment of the driving population, which has led to higher rates of traffic accidents. Data on the number of casualties in accidents involving pedestrians and motor vehicles during the day and at night indicate that the proportion of fatalities is significantly higher at night. Consequently, focusing on the traffic safety of the elderly, reducing the occurrence of nighttime traffic accidents, and promoting sustainability are crucial for Japan, which faces the challenge of becoming a "super-aging" society. Thus, we propose a system to support the safety and security of pedestrians and drivers using infrared thermal imaging data at night. In previous studies, we developed methods to detect pedestrian actions using a novel convolutional neural network (CNN)-based model, specifically VGG16. In this study, we propose improvements to an existing detection method using an improved Faster R-CNN model to detect vehicles and recognize human actions in real time at night. We acquired new video data demonstrating multiple human actions related to distant target objects captured by the infrared thermal camera. These data can be used to investigate vehicle detection and action recognition in scenes involving multiple humans using transfer learning. We experimentally evaluated the performance of our method in terms of the detection accuracy, and the results indicate that our proposed method achieved a mean average precision of 0.97 in detecting actions in scenes with multiple people positioned far from the camera. It exhibited superior accuracy compared to conventional methods.