共 23 条
[1]
Alayrac JB, 2022, ADV NEUR IN
[2]
HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION
[J].
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP),
2022,
:646-650
[3]
Elizalde B, 2022, Arxiv, DOI arXiv:2206.04769
[4]
Fonseca E., 2018, P WORKSH DET REC WIL, P69
[6]
Gagnon-Audet J.-C., 2023, ICLR 2023 WORKSH MAT
[7]
AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO
[J].
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP),
2022,
:976-980
[8]
Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
[9]
Jaegle A., 2022, Perceiver io: A general architecture for structured inputs & outputs
[10]
Koch G., 2015, ICML DEEP LEARN WORK, V2