共 73 条
[41]
Narita G, 2019, Arxiv, DOI arXiv:1903.01177
[42]
Nguyen A, 2018, Arxiv, DOI arXiv:1803.06152
[43]
Paszke A, 2016, Arxiv, DOI [arXiv:1606.02147, 10.48550/arXiv.1606.02147, DOI 10.48550/ARXIV.1606.02147]
[44]
Pennington J, 2014, P 2014 C EMP METH NA, DOI [DOI 10.3115/V1/D14-1162, 10.3115/v1/D14-1162, 10.3115/v1/d14-1162]
[45]
Conditional Image-Text Embedding Networks
[J].
COMPUTER VISION - ECCV 2018, PT XII,
2018, 11216
:258-274
[46]
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2641-2649
[47]
Prabhudesai M, 2021, Arxiv, DOI arXiv:1910.01210
[48]
Deep Hough Voting for 3D Object Detection in Point Clouds
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:9276-9285
[49]
Qi Charles Ruizhongtai, 2017, PROC 31 INT C NEURAL
[50]
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:9979-9988