Multi-Channel Audio Source Separation Using Azimuth-Frequency Analysis and Convolutional Neural Network

被引:0
|
作者
Moon, Jung Min [1 ]
Kim, Jun Ho [1 ]
Kim, Tae Woo [1 ]
Chun, Chan Jun [2 ]
Kim, Hong Kook [1 ]
机构
[1] GIST, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
[2] Korea Inst Civil Engn & Bldg Technol KICT, Construct Automat Res Ctr, Goyang, South Korea
关键词
sound source separation; non-uniform linear microphone array; azimuth-frequency analysis; convolutional neural network; STANDARD;
D O I
10.1109/icaiic.2019.8668841
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since MPEG-H supports not only channel-based but also object-based audio content, there is a need for a sound source separation technique that converts channel-based to object-based audio. Among the various sound source separation techniques, azimuth-frequency (AF) based sound source separation has been proposed for converting channel-based audio to object-based audio. Unfortunately, it is difficult to set the optimal azimuth and width using this technique. In this paper, we propose a method to determine the optimal azimuth and width based on a convolutional neural network (CNN) classifier. First, depending on numerous azimuths and widths, different sets of audio signals are separated. After that, each audio set is categorized into a specific audio class using the CNN classifier. Then, in order to separate a desired audio signal, the azimuth and width with the highest similarity for a given class are selected. The performance of the CNN classifier is evaluated in terms of separation accuracy and objective measures such as signal-todistortion ratio (SDR), signal-to-interference ratio (SIR), and signal-to-artifacts ratio (SAR). Consequently, the proposed method provides higher SDR, SAR, SIR, and separation accuracy than a minimum variance distortionless response (MVDR) beamformer as well as a method that only uses AF analysis.
引用
收藏
页码:500 / 503
页数:4
相关论文
共 50 条
  • [1] Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders
    Grais, Emad M.
    Ward, Dominic
    Plumbley, Mark D.
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1577 - 1581
  • [2] Multi-Channel Audio Source Separation Using Multiple Deformed References
    Souviraa-Labastic, Nathan
    Olivero, Anaik
    Vincent, Emmanuel
    Bimbot, Frederic
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1775 - 1787
  • [3] Classification of Hyperspectral Data Using a Multi-Channel Convolutional Neural Network
    Chen, Chen
    Zhang, Jing-Jing
    Zheng, Chun-Hou
    Yan, Qing
    Xun, Li-Na
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2018, PT III, 2018, 10956 : 81 - 92
  • [4] Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
    Chun, Chanjun
    Jeon, Kwang Myung
    Choi, Wooyeol
    SENSORS, 2020, 20 (13) : 1 - 10
  • [5] Image Reconstruction for Ultrasonic Tomography using Multi-channel Convolutional Neural Network
    Lyu, Jiashuo
    Tan, Chao
    Dong, Feng
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7304 - 7309
  • [6] Fire Recognition Based On Multi-Channel Convolutional Neural Network
    Mao, Wentao
    Wang, Wenpeng
    Dou, Zhi
    Li, Yuan
    FIRE TECHNOLOGY, 2018, 54 (02) : 531 - 554
  • [7] Multi-channel Convolutional Neural Network for Precise Meme Classification
    Sherratt, Victoria
    Pimbblet, Kevin
    Dethlefs, Nina
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 190 - 198
  • [8] Multi-channel Convolutional Neural Network Ensemble for Pedestrian Detection
    Ribeiro, David
    Carneiro, Gustavo
    Nascimento, Jacinto C.
    Bernardino, Alexandre
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 122 - 130
  • [9] Fire Recognition Based On Multi-Channel Convolutional Neural Network
    Wentao Mao
    Wenpeng Wang
    Zhi Dou
    Yuan Li
    Fire Technology, 2018, 54 : 531 - 554
  • [10] Seafloor topography inversion from multi-source marine gravity data using multi-channel convolutional neural network
    Ge, Bangzhuang
    Guo, Jinyun
    Kong, Qiaoli
    Zhu, Chengcheng
    Huang, Lingyong
    Sun, Heping
    Liu, Xin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139