Octant Spherical Harmonics Features for Source Localization Using Artificial Intelligence Based on Unified Learning Framework

被引:3
作者
Dwivedi P. [1 ,2 ]
Routray G. [1 ,2 ]
Hegde R.M. [1 ,2 ,3 ]
机构
[1] the Department of Electrical Engineering, Indian Institute of Technology, Kanpur
[2] the Department of Electrical Engineering, Indian Institute of Technology, Kanpur
[3] the Department of Electrical Engineering, Indian Institute of Technology, Dharwad
来源
IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 08期
关键词
Direction of arrival (DOA); learning approach; spherical harmonic (SH) domain; support vector machine (SVM); unified convolutional neural network (UCNN);
D O I
10.1109/TAI.2024.3352530
中图分类号
学科分类号
摘要
Recent advancements in artificial intelligence (AI) have shown potential solutions to acoustic source localization in three-dimensional space. This article proposes a new low-complex AI-based framework in the spherical harmonics (SH) domain for efficient direction of arrival (DOA) estimation. The SH coefficients are the key features for the DOA estimation and are obtained from the SH decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the unified convolutional neural network (UCNN) model is trained to estimate the source azimuth and elevation from the phase and magnitude of the SH coefficient. Since the relation between the azimuth and elevation with phase and magnitude of the SH coefficient is subjective, a high volume of data are required to train the model. In this context, the symmetric properties of the SH basis function are explored to obtain the SH implicit symmetric coefficients (SH-ISCs) that split the 3-D space into octant classes. Within each octant, the phase and magnitude of the SH coefficients exhibit one-to-one correspondence with the source azimuth and elevation and execute the data redundancy. This work can be divided into two parts, a multiclass support vector machine (M-SVM) is investigated to obtain the octant classes from the SH-ISC in the first part. In the second part, the UCNN model is developed to estimate the DOA angles in each octant class. Further, the proposed technique is computationally efficient compared to the baseline learning algorithms in terms of computational and run-time complexity. Impact Statement—DOA estimation is an important task in signal processing that involves determining the angle of arrival of signals in an array of sensors or antennas. AI techniques, such as machine learning and deep learning, can significantly enhance DOA estimation by providing more accurate, efficient, robust, and adaptable solutions. AI algorithms can learn complex patterns and relationships in data, optimize for specific hardware, and adapt to changing signal environments. This work explores the significance of AI in DOA estimation and highlights the potential benefits that AI can bring to this critical task in signal processing. M-SVM and UCNN models are studied in this work. Combining these learning models provides a robust DOA estimation corresponding to the SH features. Performance measured in terms of accuracy, root mean square error, and complexity yield intriguing findings that encourage using the proposed model in real-world scenarios. © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
引用
收藏
页码:3845 / 3857
页数:12
相关论文
共 58 条
  • [1] Xiao X., Zhao S., Zhong X., Jones D.L., Chng E.S., Li H., A learning-based approach to direction of arrival estimation in noisy and reverberant environments, Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 2814-2818, (2015)
  • [2] Salvati D., Drioli C., Foresti G.L., Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions, IEEE Trans. Emerg. Topics Comput. Intell., 2, 2, pp. 103-116, (2018)
  • [3] Opochinsky R., Laufer-Goldshtein B., Gannot S., Chechik G., Deep ranking-based sound source localization, Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), pp. 283-287, (2019)
  • [4] Liu W., Super resolution DOA estimation based on deep neural network, Sci. Rep., 10, 1, pp. 1-9, (2020)
  • [5] Ferguson E.L., Williams S.B., Jin C.T., Sound source localization in a multipath environment using convolutional neural networks, Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 2386-2390, (2018)
  • [6] Kwak Y., Kim D., Ham H., Park J., Convolutional neural network trained with synthetic pseudo-images for detecting an acoustic source, Appl. Acoust., 179, (2021)
  • [7] Abhayapala T.D., Ward D.B., Theory and design of high order sound field microphones using spherical microphone array, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2, (2002)
  • [8] Meyer J., Elko G., A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2, (2002)
  • [9] Knapp C., Carter G., The generalized correlation method for estimation of time delay, IEEE Trans. Acoust., Speech, Signal Process., 24, 4, pp. 320-327, (1976)
  • [10] Zhang C., Florencio D., Zhang Z., Why does PHAT work well in lownoise, reverberative environments?, Proc. IEEE Int. Conf. Acoust., Speech Signal Process., pp. 2565-2568, (2008)