共 67 条
[1]
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4971-4980
[2]
Alayrac JB, 2022, Arxiv, DOI [arXiv:2204.14198, 10.48550/arXiv.2204.14198]
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[4]
Bigham Jeffrey P, 2010, P 23ND ANN ACM S USE, P333
[5]
Revisiting the "Video" in Video-Language Understanding
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:2907-2917
[6]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[7]
Cherian A, 2022, AAAI CONF ARTIF INTE, P444
[8]
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:16352-16362
[10]
Dang LH, 2021, Arxiv, DOI arXiv:2106.13432