As a promising direction to alleviate the issue of high cost in label annotation, active learning has made a remarkable progress in image recognition. Recently, some efforts have tried to brought it into more visual tasks such as object detection, human-object interaction and so on. Despite the effectiveness of existing methods, there still lacking research specifically for temporal scenes, which is common in real world vision task scenes like automatic driving. To fill this gap, we propose a inconsistency based active learning method specific for temporal object detection. Specifically, We design two novel active learning strategies to ensure that the selected samples are informative and contain less redundant information in temporal scenarios. In the first stage, we adopt temporal inconsistency strategy to avoid sampling too much redundant samples; in the second stage, we sample the most informative samples by utilizing the committee inconsistency strategy. Additionally, we conduct extensive experiments on Waymo dataset, the experimental results have demonstrate the effectiveness of our proposed method. Specifically, we achieve the same performance as training on the whole dataset when just sampling 20% samples, which is much better than entropy based method.