SARdB: A dataset for audio scene source counting and analysis

被引:5
|
作者
Nigro, Michael [1 ]
Krishnan, Sridhar [1 ]
机构
[1] Ryerson Univ, Dept Elect Comp & Biomed Engn, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词
Source counting; Speaker count estimation; Audio scene analysis; Speaker diarization; Sound event detection; DIARIZATION;
D O I
10.1016/j.apacoust.2021.107985
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Determining the number of sources in a signal is an important consideration for many audio scene analysis tasks. However, source counting is not actively researched like many other audio tasks. This work looks to create Ryerson University's Signal Analysis Research (SAR) group's SARdB: a multimodal audio-text dataset with the goal of promoting research on source counting and audio scene analysis. SARdB consists of 10s long acoustic scenes containing between 1 and 4 speakers and 0-5 sound events present for a total of similar to 21 hours of data. We demonstrate the utility in performing source counting and how it can be a benefit to audio scene analysis tasks in general. Crown Copyright (C) 2021 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Trends in audio scene source counting and analysis
    Nigro, Michael
    Krishnan, Sridhar
    MACHINE LEARNING WITH APPLICATIONS, 2024, 18
  • [2] Multimodal System for Audio Scene Source Counting and Analysis
    Nigro, Michael
    Krishnan, Sridhar
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1073 - 1082
  • [3] A CURATED DATASET OF URBAN SCENES FOR AUDIO-VISUAL SCENE ANALYSIS
    Wang, Shanshan
    Mesaros, Annamaria
    Heittola, Toni
    Virtanen, Tuomas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 626 - 630
  • [4] SceneFake: An initial dataset and benchmarks for scene fake audio detection
    Yi, Jiangyan
    Wang, Chenglong
    Tao, Jianhua
    Zhang, Chu Yuan
    Fan, Cunhang
    Tian, Zhengkun
    Ma, Haoxin
    Fu, Ruibo
    PATTERN RECOGNITION, 2024, 152
  • [5] Visual Scene Graphs for Audio Source Separation
    Chatterjee, Moitreya
    Le Roux, Jonathan
    Ahuja, Narendra
    Cherian, Anoop
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1184 - 1193
  • [6] An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification
    Pham, Lam
    Ngo, Dat
    Nguyen, Thi Ngoc Tho
    Nguyen, Phu X.
    Hoang, Truong
    Schindler, Alexander
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 23 - 28
  • [7] A3CarScene: An audio-visual dataset for driving scene understanding
    Cantarini, Michela
    Gabrielli, Leonardo
    Mancini, Adriano
    Squartini, Stefano
    Longo, Roberto
    DATA IN BRIEF, 2023, 48
  • [8] Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
    Chatterjee, Moitreya
    Ahuja, Narendra
    Cherian, Anoop
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Learning long-term filter banks for audio source separation and audio scene classification
    Teng Zhang
    Ji Wu
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [10] Learning long-term filter banks for audio source separation and audio scene classification
    Zhang, Teng
    Wu, Ji
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,