SARdB: A dataset for audio scene source counting and analysis

被引:5
|
作者
Nigro, Michael [1 ]
Krishnan, Sridhar [1 ]
机构
[1] Ryerson Univ, Dept Elect Comp & Biomed Engn, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词
Source counting; Speaker count estimation; Audio scene analysis; Speaker diarization; Sound event detection; DIARIZATION;
D O I
10.1016/j.apacoust.2021.107985
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Determining the number of sources in a signal is an important consideration for many audio scene analysis tasks. However, source counting is not actively researched like many other audio tasks. This work looks to create Ryerson University's Signal Analysis Research (SAR) group's SARdB: a multimodal audio-text dataset with the goal of promoting research on source counting and audio scene analysis. SARdB consists of 10s long acoustic scenes containing between 1 and 4 speakers and 0-5 sound events present for a total of similar to 21 hours of data. We demonstrate the utility in performing source counting and how it can be a benefit to audio scene analysis tasks in general. Crown Copyright (C) 2021 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Audio Content Analysis for Understanding Structures of Scene in Video
    Kang, Chan-Mi
    Baek, Joong-Hwan
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 1213 - 1218
  • [32] The greek audio dataset
    Makris, Dimos (c12makr@ionio.gr), 1600, Springer Science and Business Media, LLC (437):
  • [33] Audio signal analysis: An application to wolf population counting
    Dugnol, B.
    Fernandez, C.
    Galiano, G.
    Velasco, J.
    SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 485 - 492
  • [34] Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis
    Ganesh, Attigodu Chandrashekara
    Berthommier, Frederic
    Schwartz, Jean-Luc
    PHYSIOLOGY, PSYCHOACOUSTICS AND COGNITION IN NORMAL AND IMPAIRED HEARING, 2016, 894 : 399 - 408
  • [35] Sound Source Identification for Scene Analysis
    Saltali, Iren
    Ince, Gokhan
    Sariel, Sanem
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 731 - 734
  • [36] Scene change detection based on audio and video content analysis
    Zhu, YY
    Zhou, DG
    ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 229 - 234
  • [37] Design of an audio advertisement dataset
    Fu, Yutao
    Liu, Jihong
    Zhang, Qi
    Geng, Yuting
    SIXTH INTERNATIONAL CONFERENCE ON ELECTRONICS AND INFORMATION ENGINEERING, 2015, 9794
  • [38] CLOTHO: AN AUDIO CAPTIONING DATASET
    Drossos, Konstantinos
    Lipping, Samuel
    Virtanen, Tuomas
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 736 - 740
  • [39] Scene Text Dataset in Turkish
    Erdogmus, Nesli
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [40] Video Object Counting Dataset
    Makhura, Onalenna J.
    Woods, John C.
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 1 - 4