Weakly-Supervised Audio-Visual Segmentation

被引:0
|
作者
Mo, Shentong [1 ,2 ]
Raj, Bhiksha [1 ,2 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Closer Look at Weakly-Supervised Audio-Visual Source Localization
    Mo, Shentong
    Morgado, Pedro
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
    Wu, Yu
    Yang, Yi
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1326 - 1335
  • [3] Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
    Rachavarapu, Kranthi Kumar
    Rajagopalan, A. N.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10158 - 10168
  • [4] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
    Fan, Yingying
    Wu, Yu
    Du, Bo
    Lin, Yutian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
    Fan, Yingying
    Wu, Yu
    Du, Bo
    Lin, Yutian
    Advances in Neural Information Processing Systems, 2023, 36
  • [6] Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
    Lai, Yung-Hsuan
    Chen, Yen-Chun
    Wang, Yu-Chiang Frank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
    Cheng, Haoyue
    Liu, Zhaoyang
    Zhou, Hang
    Qian, Chen
    Wu, Wayne
    Wang, Limin
    COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 431 - 448
  • [8] DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing
    Jiang, Xun
    Xu, Xing
    Chen, Zhiguo
    Zhang, Jingran
    Song, Jingkuan
    Shen, Fumin
    Lu, Huimin
    Shen, Heng Tao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [9] Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
    Mo, Shentong
    Tian, Yapeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Weakly Supervised Audio-Visual Violence Detection
    Wu, Peng
    Liu, Xiaotao
    Liu, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1674 - 1685