Augmented Strategy For Polyphonic Sound Event Detection

被引:0
作者
Wang, Bolun [1 ]
Fu, Zhong-Hua [1 ,2 ]
Wu, Hao [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Xian IFLYTEK Hyper Brain Informat Technol Co Ltd, Xian, Peoples R China
来源
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年
关键词
Sound event detection; Data augmentation; Model fusion; ACOUSTIC SCENES; CLASSIFICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sound event detection is an important issue for many applications like audio content retrieval, intelligent monitoring, and scene-based interaction. The traditional studies on this topic are mainly focusing on identification of single sound event class. However, in real applications, several sound events usually happen concurrently and with different durations. That leads to a new detection task on polyphonic sound event classification along with event time boundaries. In this paper, we propose an augmented strategy for this task, which faces challenges of a large amount of unbalanced and weakly labelled training data. Specifically, the strategy includes data augmentation to enrich training set to eliminate data unbalance, a new loss function that combines cross entropy and F-score, and model fusion to integrate the powers of different classifiers. The performance of the strategy is validated on DCASE2019 dataset, and both the event and segment detections are significantly improved over the baseline system.
引用
收藏
页码:1496 / 1500
页数:5
相关论文
共 21 条
  • [1] [Anonymous], 2018, DCASE2018 CHALLENGE
  • [2] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
  • [3] [Anonymous], 2013, P WASPAA
  • [4] A flexible framework for key audio effects detection and auditory context inference
    Cai, R
    Lu, L
    Hanjalic, A
    Zhang, HJ
    Cai, LH
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 1026 - 1039
  • [5] Cakir E, 2015, NEUR NETW IJCNN 2015, V2015, P1
  • [6] Where am I? Scene recognition for mobile robots using audio features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    Mataric, Maja J.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 885 - 888
  • [7] Overlapping sound event recognition using local spectrogram features and the generalised hough transform
    Dennis, J.
    Tran, H. D.
    Chng, E. S.
    [J]. PATTERN RECOGNITION LETTERS, 2013, 34 (09) : 1085 - 1093
  • [8] Heittola T, 2013, INT CONF ACOUST SPEE, P8677, DOI 10.1109/ICASSP.2013.6639360
  • [9] Sound Event Recognition With Probabilistic Distance SVMs
    Huy Dat Tran
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1556 - 1568
  • [10] JiaKai L., 2018, Tech. Rep.