共 50 条
- [21] MAVT-FG: Multimodal Audio-Visual Transformer for Weakly-supervised Fine-Grained Recognition PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3811 - 3819
- [24] Weakly-supervised Disentanglement Network for Video Fingerspelling Detection PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5446 - 5455
- [25] Audio-Visual Weakly Supervised Approach for Apathy Detection in the Elderly 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
- [27] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18827 - 18836
- [28] Label-Anticipated Event Disentanglement for Audio-Visual Video Parsing COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 35 - 51
- [29] Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 6278 - 6287