Weakly-Supervised Audio-Visual Segmentation

被引：0

作者：

Mo, Shentong ^{[1
,2
]}

Raj, Bhiksha ^{[1
,2
]}

机构：

[1] CMU, Pittsburgh, PA 15213 USA

[2] MBZUAI, Abu Dhabi, U Arab Emirates

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.

引用

页数：14

共 50 条

[41] WEAKLY-SUPERVISED PLATE AND FOOD REGION SEGMENTATION [J].

Shimoda, Wataru ;

Yanai, Keiji .

2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,

[42] Rethinking CAM in Weakly-Supervised Semantic Segmentation [J].

Song, Yuqi ;

Li, Xiaojie ;

Shi, Canghong ;

Feng, Shihao ;

Wang, Xin ;

Luo, Yong ;

Xi, Wu .

IEEE ACCESS, 2022, 10 :126440-126450

[43] Weakly-Supervised Action Detection Guided by Audio Narration [J].

Ye, Keren ;

Kovashka, Adriana .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :1527-1537

[44] Predicting Segmentation "Easiness" from the Consistency for Weakly-Supervised Segmentation [J].

Shimoda, Wataru ;

Yanai, Keiji .

PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, :292-297

[45] Weakly-supervised Discovery of Visual Pattern Configurations [J].

Song, Hyun Oh ;

Lee, Yong Jae ;

Jegelka, Stefanie ;

Darrell, Trevor .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27

[46] AVSegFormer: Audio-Visual Segmentation with Transformer [J].

Gao, Shengyi ;

Chen, Zhe ;

Chen, Guo ;

Wang, Wenhai ;

Lu, Tong .

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, :12155-12163

[47] Weakly-Supervised Dual Clustering for Image Semantic Segmentation [J].

Liu, Yang ;

Liu, Jing ;

Li, Zechao ;

Tang, Jinhui ;

Lu, Hanqing .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2075-2082

[48] Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation [J].

Kim, Beomyoung ;

Han, Sangeun ;

Kim, Junmo .

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 :1754-1761

[49] Unsupervised Audio-Visual Lecture Segmentation [J].

Singh, S. Darshan ;

Gupta, Anchit ;

Jawahar, C. V. ;

Tapaswi, Makarand .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5221-5230

[50] Weakly-Supervised Semantic Segmentation Using Motion Cues [J].

Tokmakov, Pavel ;

Alahari, Karteek ;

Schmid, Cordelia .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :388-404

← 1 2 3 4 5 →