Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features

被引:8
作者
Sharma, Bidisha [1 ]
Wang, Ye [2 ]
机构
[1] Natl Univ Singapore, Elect & Comp Engn Dept, Singapore 117543, Singapore
[2] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
关键词
Noise measurement; Feature extraction; Interference; Correlation; Acoustics; Discrete Fourier transforms; Speech processing; Song intelligibility; language learning; song recommendation; music; vocal-specific features; modulation spectrum; excitation source; LINEAR PREDICTION; EPOCH EXTRACTION; SPEECH; MUSIC; LANGUAGE; LYRICS; PITCH;
D O I
10.1109/TASLP.2019.2955253
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An objective machine-driven measure of song intelligibility would be of great utility for various music information retrieval tasks. Song intelligibility mostly depends on two factors, the amount of interference caused by background accompaniment, and the quality of singing vocal. We leverage these two factors to determine the intelligibility of a song. For the first factor, we adapt a well known method for intelligibility prediction of noisy speech, short term objective intelligibility (STOI), to singing. The singing-adapted STOI considers the polyphonic song as a time-frequency weighted noisy version of the extracted singing vocal. We use U-net based audio source separation method to extract singing vocal from a polyphonic song. The singing vocal shares the same underlying physiological mechanism for production as that of speech, with some differences in the pronunciation and prosody of the phonemes. Therefore, for the second factor, we have introduced vocal-specific features to measure the intelligibility of the singing vocal, which are excitation source, spectral, and prosodic singing characteristics. We perform detailed analysis on each of these features to establish their efficacy for quantifying song intelligibility. We train a regression model to derive the intelligibility scores using a combination of the vocal-specific features and singing adapted STOI, obtaining a significant improvement in performance. The correlation between the intelligibility score obtained using proposed framework and human-rated intelligibility score is 0.81, which shows the efficacy of the proposed approach.
引用
收藏
页码:319 / 331
页数:13
相关论文
共 43 条
  • [1] EPOCH EXTRACTION FROM LINEAR PREDICTION RESIDUAL FOR IDENTIFICATION OF CLOSED GLOTTIS INTERVAL
    ANANTHAPADMANABHA, TV
    YEGNANARAYANA, B
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (04): : 309 - 319
  • [2] [Anonymous], 2017, IEEE SIGNAL PROC LET, DOI DOI 10.1109/LSP.2017.2662805
  • [3] THE EFFECT OF PITCH-RELATED CHANGES ON THE PERCEPTION OF SUNG VOWELS
    BENOLKEN, MS
    SWANSON, CE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) : 1781 - 1785
  • [4] Borch D. Z., 2002, LOG PHON VOCOL, V27, P37
  • [5] Comparison of Word Intelligibility in Spoken and Sung Phrases
    Collister, Lauren B.
    Huron, David
    [J]. EMPIRICAL MUSICOLOGY REVIEW, 2008, 3 (03): : 109 - 125
  • [6] CATCHING THE LYRICS: INTELLIGIBILITY IN TWELVE SONG GENRES
    Condit-Schultz, Nathaniel
    Huron, David
    [J]. MUSIC PERCEPTION, 2015, 32 (05): : 470 - 483
  • [7] A PERCEPTUAL STUDY OF THE INFLUENCE OF PITCH ON THE INTELLIGIBILITY OF SUNG VOWELS
    DICARLO, NS
    GERMAIN, A
    [J]. PHONETICA, 1985, 42 (04) : 188 - 197
  • [8] EFFECT OF TEMPORAL ENVELOPE SMEARING ON SPEECH RECEPTION
    DRULLMAN, R
    FESTEN, JM
    PLOMP, R
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (02) : 1053 - 1064
  • [9] Remaking speech
    Dudley, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1939, 11 (02) : 169 - 177
  • [10] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech
    Falk, Tiago H.
    Zheng, Chenxi
    Chan, Wai-Yip
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1766 - 1774