Construction of an effective telephone bandwidth specifically for speaker recognition

被引：0

作者：

Thiruvaran, T. ^{[1
]}

机构：

[1] Univ Jaffna, Fac Engn, Dept Elect & Elect Engn, Ariviyal Nagar, Kilinochchi 44000, Sri Lanka

来源：

JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA | 2024年 / 52卷 / 03期

关键词：

Frequency band shifting; speaker recognition; speaker specific information; speech intelligibility; SPEECH; EXTENSION;

D O I：

10.4038/jnsfsr.v52i3.11717

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Although the telephone bandwidth is 0-4 kHz, the speaker specific information is not evenly distributed within this range but extends beyond 4 kHz. This is because the hypopharynx, especially the piriform fossa, affects the higher frequency region and contributes more to inter-speaker variation. By effectively shifting the higher bands to a lower frequency region, where speaker specific information is reduced, an effective 4 kHz bandwidth can be constructed to enhance speaker recognition performance. To achieve this a method was already proposed, which is extended in this paper to experimentally demonstrate and validate with more experiments. Furthermore, this paper defines the theoretically possible frequency space for which the frequency shifting method can be applied. To validate the method for different combinations of bands, possible bands were shifted in various directions in small steps. Speaker recognition experiments were conducted at each step to compare the performance against the baseband without any frequency shifting. Using the results of these extensive experiments, an approximate frequency space was defined where this frequency shifting performed better than the conventional baseband of 0-4 kHz signal. A simplified frequency shifting method was also investigated. Finally, the speech intelligibility of the frequency shifted narrow band speech signal was analyzed using objective speech quality measures. This showed that intelligibility was not significantly affected by the frequency shifting method.

引用

页码：283 / 298

页数：16

共 22 条

[1] Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation
Abel, Johannes
Fingscheidt, Tim
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 71 - 83
[2] [Anonymous], 2008, The NIST year 2008 speaker recognition evaluation plan
[3] [Anonymous], 2001, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs
[4] Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison
Cavalcanti, Julio Cesar
Eriksson, Anders
Barbosa, Plinio A.
[J]. PLOS ONE, 2021, 16 (02):
[5] Chen NF, 2015, INT CONF ACOUST SPEE, P5366, DOI 10.1109/ICASSP.2015.7178996
[6] Spectral and temporal manipulations of SFF envelopes for enhancement of speech intelligibility in noise
Chennupati, Nivedita
Kadiri, Sudarsana Reddy
Yegnanarayana, B.
[J]. COMPUTER SPEECH AND LANGUAGE, 2019, 54 : 86 - 105
[7] Acoustic characteristics of the piriform fossa in models and humans
Dang, JW
Honda, K
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (01) : 456 - 465
[8] A New Method to Explore the Spectral Impact of the Piriform Fossae on the Singing Voice: Benchmarking Using MRI-Based 3D-Printed Vocal Tracts
Delvaux, Bertrand
Howard, David
[J]. PLOS ONE, 2014, 9 (07):
[9] Auditory training with spectrally shifted speech: Implications for cochlear implant patient auditory rehabilitation
Fu, QJ
Nogaki, G
Galvin, JJ
[J]. JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2005, 6 (02): : 180 - 189
[10] ViSQOL: an objective speech quality model
Hines, Andrew
Skoglund, Jan
Kokaram, Anil C.
Harte, Naomi
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,

← 1 2 3 →