Weakly-Supervised Audio-Visual Segmentation

被引:0
作者
Mo, Shentong [1 ,2 ]
Raj, Bhiksha [1 ,2 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.
引用
收藏
页数:14
相关论文
共 50 条
[41]   Rethinking CAM in Weakly-Supervised Semantic Segmentation [J].
Song, Yuqi ;
Li, Xiaojie ;
Shi, Canghong ;
Feng, Shihao ;
Wang, Xin ;
Luo, Yong ;
Xi, Wu .
IEEE ACCESS, 2022, 10 :126440-126450
[42]   WEAKLY-SUPERVISED PLATE AND FOOD REGION SEGMENTATION [J].
Shimoda, Wataru ;
Yanai, Keiji .
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[43]   Weakly-Supervised Action Detection Guided by Audio Narration [J].
Ye, Keren ;
Kovashka, Adriana .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :1527-1537
[44]   Predicting Segmentation "Easiness" from the Consistency for Weakly-Supervised Segmentation [J].
Shimoda, Wataru ;
Yanai, Keiji .
PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, :292-297
[45]   Weakly-supervised Discovery of Visual Pattern Configurations [J].
Song, Hyun Oh ;
Lee, Yong Jae ;
Jegelka, Stefanie ;
Darrell, Trevor .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[46]   AVSegFormer: Audio-Visual Segmentation with Transformer [J].
Gao, Shengyi ;
Chen, Zhe ;
Chen, Guo ;
Wang, Wenhai ;
Lu, Tong .
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, :12155-12163
[47]   Weakly-Supervised Dual Clustering for Image Semantic Segmentation [J].
Liu, Yang ;
Liu, Jing ;
Li, Zechao ;
Tang, Jinhui ;
Lu, Hanqing .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2075-2082
[48]   Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation [J].
Kim, Beomyoung ;
Han, Sangeun ;
Kim, Junmo .
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 :1754-1761
[49]   Unsupervised Audio-Visual Lecture Segmentation [J].
Singh, S. Darshan ;
Gupta, Anchit ;
Jawahar, C. V. ;
Tapaswi, Makarand .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5221-5230
[50]   Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations [J].
Chang, Ruiheng ;
Wang, Dong ;
Guo, Haiyan ;
Ding, Jia ;
Wang, Liwei .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 :648-658