共 94 条
- [1] He Kaiming, Zhang Xiangyu, Ren Shaoqing, Et al., Deep residual learning for image recognition [C], Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
- [2] Simonyan K, Zisserman A., Very deep convolutional networks for large-scale image recognition [J], (2015)
- [3] Krizhevsky A, Sutskever I, Hinton G E., ImageNet classification with deep convolutional neural networks [C], Proc of the 26th Advances in Neural Information Processing Systems, pp. 1106-1114, (2012)
- [4] Brown T B, Mann B, Ryder N, Et al., Language models are few-shot learners [C], Proc of the 34th Advances in Neural Information Processing Systems, pp. 1877-1901, (2020)
- [5] Devlin J, Chang Mingwei, Lee K, Et al., BERT: Pre-training of deep bidirectional transformers for language understanding [C], Proc of the Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171-4186, (2019)
- [6] Vaswani A, Shazeer N, Parmar N, Et al., Attention is all you need [C], Proc of the 31st Advances in Neural Information Processing Systems, pp. 5998-6008, (2017)
- [7] Wu Haiping, Chen Yuntao, Wang Naiyan, Et al., Sequence level semantics aggregation for video object detection [C], Proc of the IEEE Int Conf on Computer Vision, pp. 9217-9225, (2019)
- [8] Jiajun Deng, Yingwei Pan, Ting Yao, Et al., Single shot video object detector[J], IEEE Transactions on Multimedia, 23, pp. 846-858, (2021)
- [9] Shvets M, Liu Wei, Berg A C., Leveraging long-range temporal relationships between proposals for video object detection [C], Proc of the IEEE Int Conf on Computer Vision, pp. 9756-9764, (2019)
- [10] Russakovsky O, Deng Jia, Su H, Et al., Imagenet large scale visual recognition challenge[J], International Journal of Computer Vision, 115, 3, (2015)