Data-driven speech intelligibility prediction has been slow to take off. Datasets of measured speech intelligibility are scarce, and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacoustic models and heuristics are still the state-of-the-art. This work proposes a U-Net inspired fully convolutional neural network architecture, NSIP, trained and tested on ten datasets to predict intelligibility of time-domain speech. The architecture is compared to a frequency domain data-driven predictor and to the classical state-of-the-art predictors STOI, ESTOI, HASPI and SIIB. The performance of NSIP is found to be superior for datasets seen in the training phase. On unseen datasets NSIP reaches performance comparable to classical predictors.
机构:
Carl von Ossietzky Univ Oldenburg, Med Phys Sect, D-26111 Oldenburg, Germany
HorTech Natl Ctr Competence Hearing Aid Syst Tech, Res & Dev Sect, Oldenburg, GermanyUniv Toronto, Toronto, ON M5S 1A1, Canada
Huber, Rainer
;
Kates, James M.
论文数: 0引用数: 0
h-index: 0
机构:
Hearing Aid Manufacturer GN ReSound, New York, NY USA
Acoust Soc Amer, New York, NY USA
Audio Engn Soc, New York, NY USAUniv Toronto, Toronto, ON M5S 1A1, Canada
Kates, James M.
;
Scollie, Susan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Natl Ctr Audiol, London, ON, CanadaUniv Toronto, Toronto, ON M5S 1A1, Canada
机构:
Carl von Ossietzky Univ Oldenburg, Med Phys Sect, D-26111 Oldenburg, Germany
HorTech Natl Ctr Competence Hearing Aid Syst Tech, Res & Dev Sect, Oldenburg, GermanyUniv Toronto, Toronto, ON M5S 1A1, Canada
Huber, Rainer
;
Kates, James M.
论文数: 0引用数: 0
h-index: 0
机构:
Hearing Aid Manufacturer GN ReSound, New York, NY USA
Acoust Soc Amer, New York, NY USA
Audio Engn Soc, New York, NY USAUniv Toronto, Toronto, ON M5S 1A1, Canada
Kates, James M.
;
Scollie, Susan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Natl Ctr Audiol, London, ON, CanadaUniv Toronto, Toronto, ON M5S 1A1, Canada