Pitch segmentation of speech signals based on short-time energy waveform

被引:2
|
作者
Wiriyarattanakul S. [1 ]
Eua-anant N. [1 ]
机构
[1] Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen
来源
Wiriyarattanakul, Sopon (sopon_w@kkumail.com) | 1600年 / Springer Science and Business Media, LLC卷 / 20期
关键词
Fundamental frequency; Pitch detection; Pitch segmentation; Short-time energy waveform; Speech signal; Voice signal;
D O I
10.1007/s10772-017-9459-4
中图分类号
学科分类号
摘要
In general, speech is constituted of quasi-repetitive patterns called pitches representing the speech fundamental period and tonal information of the voice. Extraction of pitch information that is crucial for many speech processing techniques, usually faces a noise problem and interference caused by high-order harmonic components. This paper introduces a novel, noise-robust method for determining speech fundamental frequency and pitch segmentation, based on a short-time energy waveform (SEW), defined as a moving average squared signal. When applying a moving average filter with a window size closed to the fundamental period, nearly repetitive patterns, with fewer ripples, synchronizing with actual pitches can clearly be observed in the SEW. The DC component in the SEW is removed using morphological top-hat and bottom-hat transforms. The fundamental frequency is determined as the frequency corresponding to the largest peak of the power spectrum of the DC-removed SEW. Finally, a time-domain window search is then performed to locate local extrema associated with pitches. Compared to traditional pitch detection techniques, the proposed technique yields pitch segmentation results with a higher rate of accuracy and greater noise robustness. © 2017, Springer Science+Business Media, LLC.
引用
收藏
页码:907 / 917
页数:10
相关论文
共 50 条
  • [1] Analysis of Speech Signals based on Short-time Fourier Method
    Jie, Yu
    2018 7TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND COMPUTER SCIENCE (ICAMCS 2018), 2019, : 398 - 402
  • [2] Robust blind dereverberation of speech signals based on characteristics of short-time speech segments
    Nakatani, Tomohiro
    Hikichi, Takafunii
    Kinoshita, Keisuke
    Yoshioka, Takuya
    Delcroix, Marc
    Miyoshi, Masato
    Juang, Bing-Hwang
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 2986 - 2989
  • [3] Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform
    Veldhuis, R
    He, HY
    SPEECH COMMUNICATION, 1996, 18 (03) : 257 - 279
  • [4] ACCURATE SPEECH SEGMENTATION VIA the IMPROVED SHORT-TIME FRACTAL DIMENSION
    胡金艳
    张太镒
    刘枫
    曹俊兴
    Academic Journal of Xi'an Jiaotong University, 2003, (02) : 139 - 142
  • [5] Short-time kurtosis of speech signals with application to co-channel speech separation
    De Leon, PL
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 831 - 833
  • [6] SHORT-TIME CEPSTRUM PITCH DETECTION
    NOLL, AM
    SCHROEDER, MR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1964, 36 (05): : 1030 - &
  • [7] The influence of alcoholic intoxication on the short-time energy function of speech
    20142117733897
    Heinrich, C. (heinrich@phonetik.uni-muenchen.de), 1600, Acoustical Society of America (135):
  • [8] The influence of alcoholic intoxication on the short-time energy function of speech
    Heinrich, Christian
    Schiel, Florian
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 135 (05): : 2942 - 2951
  • [9] Speech Hashing Algorithm Based on Short-Time Stability
    Chen, Ning
    Wan, Wang-Gen
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT II, 2009, 5769 : 426 - 434
  • [10] A study of short-time multiple signals
    Hoag, JB
    Andrew, VJ
    PROCEEDINGS OF THE INSTITUTE OF RADIO ENGINEERS, 1928, 16 (10): : 1368 - 1374