The recent success of deep image watermarking has demonstrated the potential of deep learning for watermarking, which has drawn increasing attention to deep video watermarking with the objective to improve its robustness and perceptual quality. Compared to images, video watermarking is much more challenging due to the rich structures of video data and the diversity of attacks in video transmission pipeline. The existing deep video watermarking schemes are far from satisfactory in dealing with temporal attacks, e.g., frame averaging, frame dropping and transcoding. To this end, a novel deep framework for Robustness Enhanced Video watermarking (REVMark) is proposed in this paper, aiming at improving the overall robustness, especially in dealing with H.264/AVC compression, while maintaining good visual quality. REVMark has an encoder/decoder structure with a pre-processing block (TAsBlock) to effectively extract the temporal-associated features on aligned frames. To ensure the end-to-end robust training, a distortion layer is integrated into the REVMark to resemble various attacks in real-world scenarios, among which, a new differentiable simulator of video compression, namely DiffH264, is developed to approximately simulate the process of H.264/AVC compression. In addition, the mask loss is incorporated to guide the encoder to embed the watermark in the human-imperceptible regions, thus improving the perceptual quality of the watermarked video. Experimental results demonstrate that the proposed scheme can outperform other SOTA methods while achieving 10x faster inference.