共 35 条
[1]
Afouras T., 2018, arXiv preprint arXiv:1809.00496
[3]
ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT
[J].
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021),
2021,
:8423-8427
[5]
FaceFilter: Audio-visual speech separation using still images
[J].
INTERSPEECH 2020,
2020,
:3481-3485
[6]
Elminshawi M., 2022, ARXIV220200733
[7]
GAO RH, 2021, CVPR, P15490, DOI DOI 10.1109/CVPR46437.2021.01524
[8]
SpEx plus : A Complete Time Domain Speaker Extraction Network
[J].
INTERSPEECH 2020,
2020,
:1406-1410
[9]
MULTI-STAGE SPEAKER EXTRACTION WITH UTTERANCE AND FRAME-LEVEL REFERENCE SIGNALS
[J].
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021),
2021,
:6109-6113