Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech

被引:66
|
作者
Deshmukh, O [1 ]
Espy-Wilson, CY
Salomon, A
Singh, J
机构
[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, College Pk, MD 20742 USA
[3] MIT, Speech Commun Grp, Res Lab Elect, Cambridge, MA 02142 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 05期
基金
美国国家科学基金会;
关键词
aperiodic and periodic energy; average magnitude difference function (AMDF); pitch detection; speech preprocessing; voiced obstruents; voice quality;
D O I
10.1109/TSA.2005.851910
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a time domain aperiodicity, periodicity, and pitch (APP) detector that estimates 1) the proportion of periodic and aperiodic energy in a speech signal and 2) the pitch period of the periodic component. The APP system is particularly useful in situations where the speech signal contains simultaneous periodic and aperiodic energy, as in the case of breathy vowels and some voiced obstruents. The performance of the APP system was evaluated on synthetic speech-like signals corrupted with noise at various levels of signal-to-noise ratio (SNR) and on three different natural speech databases that consist of simultaneously recorded electroglottograph (EGG) and acoustic data. When compared on a frame basis (at a frame rate of 2.5 ms) the results show excellent agreement between the periodic/aperiodic decisions made by the APP system and the estimates obtained from the EGG data (94.43 % for periodicity and 96.32 % for aperiodicity). The results also support previous studies that show that voiced obstruents are frequently manifested with either little or no aperiodic energy, or with strong periodic and aperiodic components. The EGG data were used as a reference for evaluating the pitch detection algorithm. The ground truth was not manually checked to rectify or exclude incorrect estimates. The overall gross error rate in pitch prediction across the three speech databases was 5.67 %. In the case of synthetic speech-like data, the estimated SNR was found to be in close proportion to the actual SNR, and the pitch was always accurately found regardless of the presence of any shimmer or jitter.
引用
收藏
页码:776 / 786
页数:11
相关论文
共 20 条
  • [1] USE OF PITCH CONTINUITY FOR ROBUST SPEECH ACTIVITY DETECTION
    Shao, Yiwen
    Lin, Qiguang
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5534 - 5538
  • [2] An Improved Pitch Detection of Speech Combined With Speech Enhancement
    Xu, Xin
    Zhang, Tian-qi
    Shi, Sui
    Zhang, Ya-juan
    2014 7TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2014), 2014, : 778 - 782
  • [3] Pitch detection algorithm of overlapping speech based on the energy of pitch and its harmonic
    Zhao Jun
    Pan Yong-xiang
    Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 1439 - 1442
  • [4] APPLICATION OF THE WAVELET TRANSFORM FOR PITCH DETECTION OF SPEECH SIGNALS
    KADAMBE, S
    BOUDREAUXBARTELS, GF
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) : 917 - 924
  • [5] Pitch detection of acoustic speech signal based on morphological filter
    Wang Guangyan
    Liu Yingna
    Wang Xia
    Zhao Xiaoqun
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 709 - 712
  • [6] Pitch detection of noisy speech signal based on bark wavelet transform
    Zhao, HM
    Zhu, Q
    Yu, YB
    Chen, XQ
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 418 - 421
  • [7] Pitch detection of speech signals using the cross-correlation technique
    Samad, SA
    Hussain, A
    Fah, LK
    IEEE 2000 TENCON PROCEEDINGS, VOLS I-III: INTELLIGENT SYSTEMS AND TECHNOLOGIES FOR THE NEW MILLENNIUM, 2000, : 283 - 286
  • [8] Combining Zero Replacement Speech Enhancement with Lag Window Method for Pitch Detection
    Du, Sicong
    Sugiura, Yosuke
    Shimamura, Tetsuya
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS), 2018, : 53 - 57
  • [9] Robust speech spectra restoration against unspecific noise conditions for pitch detection
    Xu, Xin
    Hayasaka, Noboru
    Miyanaga, Yoshikazu
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (03): : 775 - 781
  • [10] A pitch detection method for speech signals with low signal-to-noise ratio
    Shahnaz, C.
    Zhu, W. -P.
    Ahmad, M. O.
    2007 INTERNATIONAL SYMPOSIUM ON SIGNALS, SYSTEMS AND ELECTRONICS, VOLS 1 AND 2, 2007, : 386 - 389