Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks

被引:0
作者
Zhao, Xu Fang [1 ]
Tsimhoni, Omer [2 ]
机构
[1] Gen Motors Co, Compan & Platforms, Warren, MI 48092 USA
[2] Gen Motors Co, Connected Vehicle Experience, Warren, MI USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM 2024 | 2024年
关键词
pitch extraction; convolutional neural network; machine learning; spectrogram; Smart Mobility; NOISY;
D O I
10.1109/SM63044.2024.10733384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks (CNN) and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our approach and other state-of-the-art CNN methods reveals that our approach can increase detection accuracy by 3 similar to 5% (percentage points) across various Signal-to-Noise Ratio (SNR) conditions.
引用
收藏
页码:105 / 109
页数:5
相关论文
共 15 条
  • [1] [Anonymous], 2017, ITU-T P.1110
  • [2] SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions
    Chu, Wei
    Alwan, Abeer
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 933 - 944
  • [3] YIN, a fundamental frequency estimator for speech and music
    de Cheveigné, A
    Kawahara, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) : 1917 - 1930
  • [4] Doval B., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P221, DOI 10.1109/ICASSP.1993.319095
  • [5] Flanagan J. L., 1965, Speech Analysis Synthesis and Perception
  • [6] SPICE: Self-Supervised Pitch Estimation
    Gfeller, Beat
    Frank, Christian
    Roblek, Dominik
    Sharifi, Matt
    Tagliasacchi, Marco
    Velimirovic, Mihajlo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1118 - 1128
  • [7] Han Kun, 2014, IEEE INT C AC SPEECH
  • [8] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
    Jin, Zhaozhang
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
  • [9] SPECTRAL-ANALYSIS AND DISCRIMINATION BY ZERO-CROSSINGS
    KEDEM, B
    [J]. PROCEEDINGS OF THE IEEE, 1986, 74 (11) : 1477 - 1493
  • [10] Kim JW, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P161, DOI 10.1109/ICASSP.2018.8461329