Multichannel environmental sound segmentationwith separately trained spectral and spatial features

被引:0
作者
Yui Sudo
Katsutoshi Itoyama
Kenji Nishida
Kazuhiro Nakadai
机构
[1] Tokyo Institute of Technology,Department of Systems and Control Engineering, School of Engineering
[2] Honda Research Institute Japan Co.,undefined
[3] Ltd.,undefined
来源
Applied Intelligence | 2021年 / 51卷
关键词
Environmental sound segmentation; Sound source separation; Inter-channel phase difference; Semantic segmentation;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.
引用
收藏
页码:8245 / 8259
页数:14
相关论文
共 37 条
[21]  
Qian Y(undefined)undefined undefined undefined undefined-undefined
[22]  
Cakir E(undefined)undefined undefined undefined undefined-undefined
[23]  
Parascandolo G(undefined)undefined undefined undefined undefined-undefined
[24]  
Heittola T(undefined)undefined undefined undefined undefined-undefined
[25]  
Huttunen H(undefined)undefined undefined undefined undefined-undefined
[26]  
Virtanen T(undefined)undefined undefined undefined undefined-undefined
[27]  
Kojima R(undefined)undefined undefined undefined undefined-undefined
[28]  
Sugiyama O(undefined)undefined undefined undefined undefined-undefined
[29]  
Hoshiba K(undefined)undefined undefined undefined undefined-undefined
[30]  
Nakadai K(undefined)undefined undefined undefined undefined-undefined