Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks

被引：0

作者：

Zhao, Xu Fang ^{[1
]}

Tsimhoni, Omer ^{[2
]}

机构：

[1] Gen Motors Co, Compan & Platforms, Warren, MI 48092 USA

[2] Gen Motors Co, Connected Vehicle Experience, Warren, MI USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM 2024 | 2024年

关键词：

pitch extraction; convolutional neural network; machine learning; spectrogram; Smart Mobility; NOISY;

D O I：

10.1109/SM63044.2024.10733384

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks (CNN) and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our approach and other state-of-the-art CNN methods reveals that our approach can increase detection accuracy by 3 similar to 5% (percentage points) across various Signal-to-Noise Ratio (SNR) conditions.

引用

页码：105 / 109

页数：5

共 15 条

[1] [Anonymous], 2017, ITU-T P.1110
[2] SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions
Chu, Wei
Alwan, Abeer
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 933 - 944
[3] YIN, a fundamental frequency estimator for speech and music
de Cheveigné, A
Kawahara, H
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) : 1917 - 1930
[4] Doval B., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P221, DOI 10.1109/ICASSP.1993.319095
[5] Flanagan J. L., 1965, Speech Analysis Synthesis and Perception
[6] SPICE: Self-Supervised Pitch Estimation
Gfeller, Beat
Frank, Christian
Roblek, Dominik
Sharifi, Matt
Tagliasacchi, Marco
Velimirovic, Mihajlo
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1118 - 1128
[7] Han Kun, 2014, IEEE INT C AC SPEECH
[8] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
Jin, Zhaozhang
Wang, DeLiang
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
[9] SPECTRAL-ANALYSIS AND DISCRIMINATION BY ZERO-CROSSINGS
KEDEM, B
[J]. PROCEEDINGS OF THE IEEE, 1986, 74 (11) : 1477 - 1493
[10] Kim JW, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P161, DOI 10.1109/ICASSP.2018.8461329

← 1 2 →