共 51 条
[1]
Adi Y, 2019, INT CONF ACOUST SPEE, P3742, DOI 10.1109/ICASSP.2019.8682468
[2]
Video and Text Matching with Conditioned Embeddings
[J].
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022),
2022,
:478-487
[3]
Bock S., 2013, DAFx-13
[4]
Brown TJ., 2020, [No title captured]
[5]
Sound2Sight: Generating Visual Dynamics from Sound and Context
[J].
COMPUTER VISION - ECCV 2020, PT XXVII,
2020, 12372
:701-719
[6]
Chen S., 2022, arXiv
[7]
Copet J, 2024, Arxiv, DOI [arXiv:2306.05284, 10.48550/arXiv.2306.05284, DOI 10.48550/ARXIV.2306.05284]
[8]
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
[J].
COMPUTER VISION, ECCV 2022, PT XXXVII,
2022, 13697
:88-105
[9]
Taming Transformers for High-Resolution Image Synthesis
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:12868-12878
[10]
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
[J].
COMPUTER VISION - ECCV 2022, PT XV,
2022, 13675
:89-106