Pitch segmentation of speech signals based on short-time energy waveform

被引：2

作者：

Wiriyarattanakul S. ^{[1
]}

Eua-anant N. ^{[1
]}

机构：

[1] Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen

来源：

Wiriyarattanakul, Sopon (sopon_w@kkumail.com) | 1600年 / Springer Science and Business Media, LLC卷 / 20期

关键词：

Fundamental frequency; Pitch detection; Pitch segmentation; Short-time energy waveform; Speech signal; Voice signal;

D O I：

10.1007/s10772-017-9459-4

中图分类号：

学科分类号：

摘要：

In general, speech is constituted of quasi-repetitive patterns called pitches representing the speech fundamental period and tonal information of the voice. Extraction of pitch information that is crucial for many speech processing techniques, usually faces a noise problem and interference caused by high-order harmonic components. This paper introduces a novel, noise-robust method for determining speech fundamental frequency and pitch segmentation, based on a short-time energy waveform (SEW), defined as a moving average squared signal. When applying a moving average filter with a window size closed to the fundamental period, nearly repetitive patterns, with fewer ripples, synchronizing with actual pitches can clearly be observed in the SEW. The DC component in the SEW is removed using morphological top-hat and bottom-hat transforms. The fundamental frequency is determined as the frequency corresponding to the largest peak of the power spectrum of the DC-removed SEW. Finally, a time-domain window search is then performed to locate local extrema associated with pitches. Compared to traditional pitch detection techniques, the proposed technique yields pitch segmentation results with a higher rate of accuracy and greater noise robustness. © 2017, Springer Science+Business Media, LLC.

引用

页码：907 / 917

页数：10