Single-ended speech quality measurement using machine learning methods

被引:62
作者
Falk, Tiago H. [1 ]
Chan, Wai-Yip [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 06期
基金
加拿大自然科学与工程研究理事会;
关键词
mean opinion score (MOS); objective quality measurement; quality model; single-ended measurement; speech communication; speech distortions; speech enhancement; speech quality; subjective quality;
D O I
10.1109/TASL.2006.883253
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe a novel single-ended algorithm constructed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are combined using hard or soft decisions generated by a classifier which has learned to match the input signal with the models. Test results show the algorithm outperforming ITU-T P.563, the current "state-of-art" standard single-ended algorithm. Employed in a distributed double-ended measurement configuration, the proposed algorithm is found to be more effective than P.563 in assessing the quality of noise reduction systems and can provide a functionality not available with P.862 PESQ, the current double-ended standard algorithm.
引用
收藏
页码:1935 / 1947
页数:13
相关论文
共 32 条
  • [1] *3GPP2, 2004, 26094 3GPP2 TS
  • [2] [Anonymous], P ROB METH SPEECH RE
  • [3] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Canny JF, 1983, 720 MIT ART INT LAB
  • [6] Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
  • [7] Nonintrusive speech quality estimation using Gaussian mixture models
    Falk, TH
    Chan, WY
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (02) : 108 - 111
  • [8] Falk TH, 2006, INT CONF ACOUST SPEE, P837
  • [9] MULTIVARIATE ADAPTIVE REGRESSION SPLINES
    FRIEDMAN, JH
    [J]. ANNALS OF STATISTICS, 1991, 19 (01) : 1 - 67
  • [10] Non-intrusive speech-quality assessment using vocal-tract models
    Gray, P
    Hollier, MP
    Massara, RE
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2000, 147 (06): : 493 - 501