共 50 条
- [31] UniCon: Unified Context Network for Robust Active Speaker Detection [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3964 - 3972
- [32] Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT [J]. INTERSPEECH 2022, 2022, : 4785 - 4789
- [33] ACTIVE SPEAKER DETECTION IN HUMAN MACHINE MULTIPARTY DIALOGUE USING VISUAL PROSODY INFORMATION [J]. 2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 1207 - 1211
- [34] AVQA: A Dataset for Audio-Visual Question Answering on Videos [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3480 - 3491
- [35] Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4031 - 4041
- [36] Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification [J]. INTERSPEECH 2024, 2024, : 4728 - 4732
- [37] AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6668 - 6672
- [38] Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation [J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
- [39] An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement [J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 722 - 728
- [40] THE XMUSPEECH SYSTEM FOR AUDIO-VISUAL TARGET SPEAKER EXTRACTION IN MISP 2023 CHALLENGE<bold> </bold> [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 39 - 40