Single-ended speech quality measurement using machine learning methods

被引：62

作者：

Falk, Tiago H. ^{[1
]}

Chan, Wai-Yip ^{[1
]}

机构：

[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 06期

基金：

加拿大自然科学与工程研究理事会;

关键词：

mean opinion score (MOS); objective quality measurement; quality model; single-ended measurement; speech communication; speech distortions; speech enhancement; speech quality; subjective quality;

D O I：

10.1109/TASL.2006.883253

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We describe a novel single-ended algorithm constructed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are combined using hard or soft decisions generated by a classifier which has learned to match the input signal with the models. Test results show the algorithm outperforming ITU-T P.563, the current "state-of-art" standard single-ended algorithm. Employed in a distributed double-ended measurement configuration, the proposed algorithm is found to be more effective than P.563 in assessing the quality of noise reduction systems and can provide a functionality not available with P.862 PESQ, the current double-ended standard algorithm.

引用

页码：1935 / 1947

页数：13

共 32 条

[1] *3GPP2, 2004, 26094 3GPP2 TS
[2] [Anonymous], P ROB METH SPEECH RE
[3] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
Blewitt, Marnie E.
Gendrel, Anne-Valerie
Pang, Zhenyi
Sparrow, Duncan B.
Whitelaw, Nadia
Craig, Jeffrey M.
Apedaile, Anwyn
Hilton, Douglas J.
Dunwoodie, Sally L.
Brockdorff, Neil
Kay, Graham F.
Whitelaw, Emma
[J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
[4] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[5] Canny JF, 1983, 720 MIT ART INT LAB
[6] Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[7] Nonintrusive speech quality estimation using Gaussian mixture models
Falk, TH
Chan, WY
[J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (02) : 108 - 111
[8] Falk TH, 2006, INT CONF ACOUST SPEE, P837
[9] MULTIVARIATE ADAPTIVE REGRESSION SPLINES
FRIEDMAN, JH
[J]. ANNALS OF STATISTICS, 1991, 19 (01) : 1 - 67
[10] Non-intrusive speech-quality assessment using vocal-tract models
Gray, P
Hollier, MP
Massara, RE
[J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2000, 147 (06): : 493 - 501

← 1 2 3 4 →