Weakly-Supervised Audio-Visual Segmentation

被引:0
作者
Mo, Shentong [1 ,2 ]
Raj, Bhiksha [1 ,2 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.
引用
收藏
页数:14
相关论文
共 50 条
[31]   A Weakly-Supervised Approach for Semantic Segmentation [J].
Feng, Yanqing ;
Wang, Lunwen .
PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, :2311-2314
[32]   Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling [J].
Ru, Lixiang ;
Du, Bo ;
Zhan, Yibing ;
Wu, Chen .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) :1127-1144
[33]   Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling [J].
Lixiang Ru ;
Bo Du ;
Yibing Zhan ;
Chen Wu .
International Journal of Computer Vision, 2022, 130 :1127-1144
[34]   Weakly-supervised learning of visual relations [J].
Peyre, Julia ;
Laptev, Ivan ;
Schmid, Cordelia ;
Sivic, Josef .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5189-5198
[35]   Audio-Visual Segmentation with Semantics [J].
Zhou, Jinxing ;
Shen, Xuyang ;
Wang, Jianyuan ;
Zhang, Jiayi ;
Sun, Weixuan ;
Zhang, Jing ;
Birchfield, Stan ;
Guo, Dan ;
Kong, Lingpeng ;
Wang, Meng ;
Zhong, Yiran .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (04) :1644-1664
[36]   DUAL SPACE EMBEDDING LEARNING FOR WEAKLY SUPERVISED AUDIO-VISUAL VIOLENCE DETECTION [J].
Liu, Yiran ;
Wu, Zhanjie ;
Mo, Mengjingcheng ;
Gan, Ji ;
Leng, Jiaxu ;
Gao, Xinbo .
2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
[37]   On Regularized Losses for Weakly-supervised CNN Segmentation [J].
Tang, Meng ;
Perazzi, Federico ;
Djelouah, Abdelaziz ;
Ben Ayed, Ismail ;
Schroers, Christopher ;
Boykov, Yuri .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :524-540
[38]   Weakly-Supervised RGBD Video Object Segmentation [J].
Yang, Jinyu ;
Gao, Mingqi ;
Zheng, Feng ;
Zhen, Xiantong ;
Ji, Rongrong ;
Shao, Ling ;
Leonardis, Ales .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :2158-2170
[39]   IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION [J].
Jonnarth, Arvi ;
Felsberg, Michael .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :2639-2643
[40]   Token Contrast for Weakly-Supervised Semantic Segmentation [J].
Ru, Lixiang ;
Zheng, Hehang ;
Zhan, Yibing ;
Du, Bo .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :3093-3102