Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network

被引:0
作者
Wu, Yulin [1 ]
Hu, Ruimin [1 ]
Wang, Xiaochen [1 ]
机构
[1] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
DoA estimation; 3D-CNNs and 2D-CNNs; residual dense; audio and visual modalities; LOCALIZATION;
D O I
10.1109/ICME55011.2023.00115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reality, audible and visible sound sources are closely aligned, and they can help humans locate sources exactly. To exploit the complementarity between audio and visual data in multi-speaker direction of arrival (DoA) estimation, we propose a novel network consisting of 3D convolution neural networks (3D-CNNs) and 2D-CNNs mixture networks with residual dense blocks. It has two main advantages: 1) both input audio and visual features are low-level signal representation: the real and imaginary parts of STFT coefficients for the audio feature and pixel coordinates for the visual feature, which can allow the network to learn to extract the most informative high-level features. 2) 3D-CNNs with the residual dense block are used for audio and visual feature mapping along the time and frequency axis. The following 2D-CNNs are to ensemble the high-level features along the DoA axis. Experimental results demonstrate promising SSL performance.
引用
收藏
页码:636 / 641
页数:6
相关论文
共 50 条
  • [21] Adaptive Direction-of-Arrival Estimation Using Deep Neural Network in Marine Acoustic Environment
    Nie, Weihang
    Zhang, Xiaowei
    Xu, Ji
    Guo, Lianghao
    Yan, Yonghong
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (13) : 15093 - 15105
  • [22] Multi-Source Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals
    Poschadel, Nils
    Preihs, Stephan
    Peissig, Juergen
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1015 - 1019
  • [23] Discrimination method of direction of arrival estimation correctness based on deep neural network
    Tanaka, Ryusuke
    Haneda, Yoichi
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (01) : 318 - 321
  • [24] Shift-Invariant Structure-Imposed Convolutional Neural Networks for Direction of Arrival Estimation
    Adhikari, Kaushallya
    [J]. 2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 179 - 186
  • [25] Direction of Arrival Estimation of Noisy Speech using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals
    Poschadel, Nils
    Hupke, Robert
    Preihs, Stephan
    Peissig, Juergen
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 211 - 215
  • [26] MDTCNet: Multi-Task Classifications Network and TCNN for Direction of Arrival Estimation
    Yu Jiarun
    Wang Yafeng
    [J]. CHINA COMMUNICATIONS, 2024, 21 (10) : 148 - 166
  • [27] Residual Neural Network for Direction-of-Arrival Estimation of Multiple Targets in Low SNR
    Qin, Yanhua
    [J]. IET SIGNAL PROCESSING, 2024, 2024
  • [28] Cascaded Deep Neural Network for Off-Grid Direction-of-Arrival Estimation∗ ∗
    Wang, Huafei
    Wang, Xianpeng
    Lan, Xiang
    Su, Ting
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2024, E107B (10) : 633 - 644
  • [29] Direction of arrival estimation using Polynomial Roots Intersection for Multi-Dimensional Estimation (PRIME)
    Hwang, H. K.
    Aliyazicioglu, Zekeriya
    Grice, Marshall
    Yakovlev, Anatoly
    Lu, Peter
    [J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 1416 - 1421
  • [30] Direction-of-Arrival Estimation for a Random Sparse Linear Array Based on a Graph Neural Network
    Yang, Yiye
    Zhang, Miao
    Peng, Shihua
    Ye, Mingkun
    Zhang, Yixiong
    [J]. SENSORS, 2024, 24 (01)