Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network

被引：0

作者：

Wu, Yulin ^{[1
]}

Hu, Ruimin ^{[1
]}

Wang, Xiaochen ^{[1
]}

机构：

[1] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

DoA estimation; 3D-CNNs and 2D-CNNs; residual dense; audio and visual modalities; LOCALIZATION;

D O I：

10.1109/ICME55011.2023.00115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In reality, audible and visible sound sources are closely aligned, and they can help humans locate sources exactly. To exploit the complementarity between audio and visual data in multi-speaker direction of arrival (DoA) estimation, we propose a novel network consisting of 3D convolution neural networks (3D-CNNs) and 2D-CNNs mixture networks with residual dense blocks. It has two main advantages: 1) both input audio and visual features are low-level signal representation: the real and imaginary parts of STFT coefficients for the audio feature and pixel coordinates for the visual feature, which can allow the network to learn to extract the most informative high-level features. 2) 3D-CNNs with the residual dense block are used for audio and visual feature mapping along the time and frequency axis. The following 2D-CNNs are to ensemble the high-level features along the DoA axis. Experimental results demonstrate promising SSL performance.

引用

页码：636 / 641

页数：6

共 50 条

[1] Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation
He, Weipeng
Motlicek, Petr
Odobez, Jean-Marc
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1303 - 1317
[2] Multi-speaker DoA Estimation Using Audio and Visual Modality
Yulin Wu
Ruimin Hu
Xiaochen Wang
Shanfa Ke
Neural Processing Letters, 2023, 55 : 8887 - 8901
[3] Multi-speaker DoA Estimation Using Audio and Visual Modality
Wu, Yulin
Hu, Ruimin
Wang, Xiaochen
Ke, Shanfa
NEURAL PROCESSING LETTERS, 2023, 55 (07) : 8887 - 8901
[4] Multi-Speaker Direction of Arrival Estimation using SRP-PHAT Algorithm with a Weighted Histogram
Hadad, Elior
Gannot, Sharon
2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
[5] MAXIMUM LIKELIHOOD MULTI-SPEAKER DIRECTION OF ARRIVAL ESTIMATION UTILIZING A WEIGHTED HISTOGRAM
Hadad, Elior
Gannot, Sharon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 586 - 590
[6] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
Gerlach, Stephan
Bitzer, Joerg
Goetze, Stefan
Doclo, Simon
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[7] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
Stephan Gerlach
Jörg Bitzer
Stefan Goetze
Simon Doclo
EURASIP Journal on Audio, Speech, and Music Processing, 2014 (1)
[8] Broadband Direction of Arrival Estimation Based on Convolutional Neural Network
Zhu, Wenli
Zhang, Min
Wu, Chenxi
Zeng, Lingqing
IEICE TRANSACTIONS ON COMMUNICATIONS, 2020, E103B (03) : 148 - 154
[9] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[10] Direction of arrival estimation for smart antenna in multipath environment using convolutional neural network
Harkouss, Youssef
Shraim, Hassan
Bazzi, Hussein
INTERNATIONAL JOURNAL OF RF AND MICROWAVE COMPUTER-AIDED ENGINEERING, 2018, 28 (06)

← 1 2 3 4 5 →