2-D Processing of Speech for Multi-Pitch Analysis

被引:0
作者
Wang, Tianyu T. [1 ]
Quatieri, Thomas F. [1 ]
机构
[1] MIT Lincoln Lab, Lincoln, NE USA
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
关键词
2-D speech processing; Grating Compression Transform; multi-pitch analysis; segmental pitch dynamics;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically-related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are analyzed to extract pitch candidates. These candidates are then combined across multiple regions for obtaining separate pitch estimates of each speech-signal component at a single point in time. We refer to this as multi-region analysis (MRA). By explicitly accounting for pitch dynamics within localized time segments, this separability is distinct from that which can be obtained using short-time autocorrelation methods typically employed in state-of-the-art multi-pitch tracking algorithms. We illustrate the feasibility of MRA for multi-pitch estimation on mixtures of synthetic and real speech.
引用
收藏
页码:2795 / 2798
页数:4
相关论文
共 6 条
  • [1] EZZAT T, 2007, ISCA INTERSPEECH
  • [2] SUPER RESOLUTION PITCH DETERMINATION OF SPEECH SIGNALS
    MEDAN, Y
    YAIR, E
    CHAZAN, D
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (01) : 40 - 48
  • [3] QUATIERI TF, 2002, ISCA INTERSPEECH
  • [4] Stevens Kenneth N., 1998, ACOUSTIC PHONETICS
  • [5] A computationally efficient multipitch analysis model
    Tolonen, T
    Karjalainen, M
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 708 - 716
  • [6] A multipitch tracking algorithm for noisy speech
    Wu, MY
    Wang, DL
    Brown, GJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (03): : 229 - 241