共 48 条
[1]
[Anonymous], 2014, ARXIV14090473
[2]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[3]
Caruana R, 2001, ADV NEUR IN, V13, P402
[4]
Chen JY, 2019, AAAI CONF ARTIF INTE, P8175
[5]
Chen SX, 2019, AAAI CONF ARTIF INTE, P8199
[6]
Dang LH, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P636
[7]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2023,
:14773-14783
[10]
MAC: Mining Activity Concepts for Language-based Temporal Localization
[J].
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV),
2019,
:245-253