共 35 条
- [12] ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2595 - 2605
- [13] Huang Z., 2023, P ICML
- [14] Ikawa S., 2018, DETECTION CLASSIFICA, P59
- [15] Kim CD, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P119
- [16] Audio Retrieval With Natural Language Queries: A Benchmark Study [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2675 - 2685
- [18] A GENERAL FRAMEWORK FOR INCOMPLETE CROSS-MODAL RETRIEVAL WITH MISSING LABELS AND MISSING MODALITIES [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4763 - 4767
- [19] Liu Y., 2019, P BMVC
- [20] AUDIO-TEXT RETRIEVAL IN CONTEXT [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4793 - 4797