Weakly-Supervised Audio-Visual Segmentation

被引：0

作者：

Mo, Shentong ^{[1
,2
]}

Raj, Bhiksha ^{[1
,2
]}

机构：

[1] CMU, Pittsburgh, PA 15213 USA

[2] MBZUAI, Abu Dhabi, U Arab Emirates

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.

引用

页数：14

共 50 条

[31] A Weakly-Supervised Approach for Semantic Segmentation [J].

Feng, Yanqing ;

Wang, Lunwen .

PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, :2311-2314

[32] Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling [J].

Ru, Lixiang ;

Du, Bo ;

Zhan, Yibing ;

Wu, Chen .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) :1127-1144

[33] Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling [J].

Lixiang Ru ;

Bo Du ;

Yibing Zhan ;

Chen Wu .

International Journal of Computer Vision, 2022, 130 :1127-1144

[34] Weakly-supervised learning of visual relations [J].

Peyre, Julia ;

Laptev, Ivan ;

Schmid, Cordelia ;

Sivic, Josef .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5189-5198

[35] Audio-Visual Segmentation with Semantics [J].

Zhou, Jinxing ;

Shen, Xuyang ;

Wang, Jianyuan ;

Zhang, Jiayi ;

Sun, Weixuan ;

Zhang, Jing ;

Birchfield, Stan ;

Guo, Dan ;

Kong, Lingpeng ;

Wang, Meng ;

Zhong, Yiran .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (04) :1644-1664

[36] DUAL SPACE EMBEDDING LEARNING FOR WEAKLY SUPERVISED AUDIO-VISUAL VIOLENCE DETECTION [J].

Liu, Yiran ;

Wu, Zhanjie ;

Mo, Mengjingcheng ;

Gan, Ji ;

Leng, Jiaxu ;

Gao, Xinbo .

2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,

[37] On Regularized Losses for Weakly-supervised CNN Segmentation [J].

Tang, Meng ;

Perazzi, Federico ;

Djelouah, Abdelaziz ;

Ben Ayed, Ismail ;

Schroers, Christopher ;

Boykov, Yuri .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :524-540

[38] Weakly-Supervised RGBD Video Object Segmentation [J].

Yang, Jinyu ;

Gao, Mingqi ;

Zheng, Feng ;

Zhen, Xiantong ;

Ji, Rongrong ;

Shao, Ling ;

Leonardis, Ales .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :2158-2170

[39] IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION [J].

Jonnarth, Arvi ;

Felsberg, Michael .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :2639-2643

[40] WEAKLY-SUPERVISED PLATE AND FOOD REGION SEGMENTATION [J].

Shimoda, Wataru ;

Yanai, Keiji .

2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,

← 1 2 3 4 5 →