Although modern object detection AI models have the potential to be widely used in various applications such as autonomous vehicles, these models are very computationally demanding. Using high-resolution image data further increases the computational burden. Hence, we propose an acceleration method called Group of Picture (GoP) mode for object detection in video sequences by removing the temporal redundancy, unlike the existing model compression schemes. A GoP structure is composed of only one key frame and several non-key frames. In GoP-mode, object detection is adopted for key frames only, while object tracking is employed to predict the position of each object in the following non-key frames based on the tracked trajectory and momentum of each object. By using the proposed method, the thrilling latency saving can result in multiple times of execution speed acceleration so that both high detection accuracy and high execution speed can be obtained. In theory, if we adopt a GoP structure of one key frame with N non-key frames, the execution speed of object detection is accelerated to (1+N) times by equipping the GoP-mode. The effect of the number of non-key frames on the accuracy variation of an object detector equipped with GoP-mode has been analyzed. According to the experimental results, the mean average precision (mAP) of adopting GoP mode with four non-key frames in one GoP structure is competitive to that using object detection for all frames. Meanwhile, the execution frame rate is increased from the original 8 frames per second (FPS) to 35.8 FPS on the mobile platform-Jetson Nano, i.e. a speedup of 348%.