Robust Pitch Estimation and Tracking For Speakers Based on Subband Encoding and The Generalized Labeled Multi-Bernoulli Filter

被引：10

作者：

Lin, Shoufeng ^{[1
]}

机构：

[1] Curtin Univ, Sch Elect Engn Comp & Math Sci, Bentley, WA 6102, Australia

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 04期

关键词：

Pitch tracking; auditory filterbank; CASA; frequency coverage; autocorrelation; GLMB tracking filter; Ornstein-Uhlenbeck process; measurement driven birth; RANDOM FINITE SETS; FUNDAMENTAL-FREQUENCY; SPEECH; ALGORITHM; EXTRACTION; SEPARATION; NOISE; MODEL;

D O I：

10.1109/TASLP.2019.2898818

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a new pitch estimator and a novel pitch tracker for speakers. We first decompose the sound signal into subbands using an auditory filterbank, assuming time-frequency sparsity of human speech. Instead of directly selecting the number of subbands according to experience, we propose a novel frequency coverage metric to derive the number of subbands and the center frequencies of the filterbank. The subband signals are then encoded inspired by the computational auditory scene analysis approach, and the normalized autocorrelations are calculated for pitch estimation. To suppress spurious errors and track the speaker identity, the temporal continuity constraint is exploited and a generalized labeled multi-Bernoulli filter is adapted for pitch tracking, where we use a novel pitch state transition model based on the Ornstein-Uhlenbeck process, and the measurement-driven birth model for adaptive new births of pitch targets. Experimental evaluations with various additive noises demonstrate that the proposed methods have achieved better accuracy compared with several state-of-the-art pitch estimation methods in most studied scenarios. Tests using real recordings in a reverberant room also show that the proposed method is robust against reverberation.

引用

页码：827 / 841

页数：15

共 48 条

[1]

[Anonymous], M IOC SPEECH GROUP A

[2]

[Anonymous], 1993, Discrete-Time Processing of Speech Signals

[3]

[Anonymous], P 15 ANN C INT SPEEC

[4]

[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications

[5]

[Anonymous], 2003, P 15 INT C PHON SCI

[6] Labeled Random Finite Sets and the Bayes Multi-Target Tracking Filter [J].

Ba-Ngu Vo ;

Ba-Tuong Vo ;

Dinh Phung .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (24) :6554-6567

[7] Labeled Random Finite Sets and Multi-Object Conjugate Priors [J].

Ba-Tuong Vo ;

Ba-Ngu Vo .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (13) :3460-3475

[8]

Bagshaw P., 1993, Proc. Eurospeech, P1003

[9] Robust subspace-based fundamental frequency estimation [J].

Christensen, Mads G. ;

Vera-Candeas, Pedro ;

Somasundaram, Samuel D. ;

Jakobsson, Andreas .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :101-+

[10] Joint high-resolution fundamental frequency and order estimation [J].

Christensen, Mads Graesboll ;

Jakobsson, Andreas ;

Jensen, Soren Holdt .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05) :1635-1644

← 1 2 3 4 5 →