共 35 条
[1]
Language Features Matter: Effective Language Representations for Vision-Language Tasks
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7473-7482
[2]
Carreira J., 2017, CVPR NEWMODEL KINETI
[3]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4]
Gabeur V., 2020, CVPR VID PENT WORKSH
[5]
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
[J].
COMPUTER VISION - ECCV 2018, PT VI,
2018, 11210
:659-677
[6]
Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
[7]
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[9]
Densely Connected Convolutional Networks
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:2261-2269
[10]
Karpathy A, 2014, ADV NEUR IN, V27