Robustness and Accuracy of Time Delay Estimation in a Live Room

被引：0

作者：

Bayya, Yegnanarayana ^{[1
]}

Murthy, B. H. V. S. Narayana ^{[2
]}

Satyanarayana, J. V. ^{[2
]}

Pannala, Vishala ^{[1
]}

Chennupati, Nivedita ^{[1
]}

机构：

[1] IIIT, Speech Proc Lab, Hyderabad, Telangana, India

[2] Res Ctr Imarat, Hyderabad, Telangana, India

来源：

2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2021年

关键词：

time delay estimation; cross-correlation; single frequency filtering; speaker tracking; SPEECH;

D O I：

10.1109/NCC52529.2021.9530120

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Estimation of time delay from the received broadband signals like speech, collected at two or more spatially distributed microphones, has many applications. Methods like the cross-correlation of the signals directly and generalized cross-correlation based methods (GCC and GCC-PHAT) have been used for several years to estimate the time delay. Performance of these methods degrades due to noise, multi-path reflections, and reverberation in a practical environment, like a live room. The estimated time delay is usually robust due to the averaging effect of the delay obtained over several frames in an utterance of a few seconds. The robustness is affected if the varying time delay of a moving speaker is desired. A smaller duration for averaging results in errors in the estimation of the time delay, and a longer duration for averaging results in loss of accuracy. Since the single frequency filtering (SFF) based analysis provides an estimation of the instantaneous time delay, it is possible to study the trade off between accuracy and robustness. This paper examines this trade-off in determining the number of stationary speakers from mixed signals and in tracking a speaker moving along a straight line path and along a circular path. The results are illustrated for actual data collected in a live room.

引用

页码：440 / 445

页数：6

共 10 条

[1] Single Frequency Filtering Approach for Discriminating Speech and Nonspeech [J].

Aneeja, G. ;

Yegnanarayana, B. .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) :705-717

[2] Time delay estimation in room acoustic environments: An overview [J].

Chen, Jingdong ;

Benesty, Jacob ;

Huang, Yiteng Arden .

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)

[3] Spectral and temporal manipulations of SFF envelopes for enhancement of speech intelligibility in noise [J].

Chennupati, Nivedita ;

Kadiri, Sudarsana Reddy ;

Yegnanarayana, B. .

COMPUTER SPEECH AND LANGUAGE, 2019, 54 :86-105

[4] GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY [J].

KNAPP, CH ;

CARTER, GC .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04) :320-327

[5] Instantaneous Time Delay Estimation of Broadband Signals [J].

Murthy, B. H. V. S. Narayana ;

Satyanarayana, J., V ;

Chennupati, Nivedita ;

Yegnanarayana, B. .

INTERSPEECH 2020, 2020, :5081-5085

[6]

Narayana Murthy B.H.V.S., CSSP J UNPUB

[7]

Narayana Murthy B.H.V.S., 2019, CIRC SYST SIGNAL PR

[8]

Raykar Vikas C., 2003, EUROSPEECH, P69

[9] Determining number of speakers from multispeaker speech signals using excitation source information [J].

Swamy, R. Kumara ;

Murty, K. Sri Rama ;

Yegnanarayana, B. .

IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) :481-484

[10] Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking [J].

Wang, Zhong-Qiu ;

Zhang, Xueliang ;

Wang, DeLiang .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) :178-188

← 1 →