Detection of nonverbal vocalizations using Gaussian Mixture Models: looking for fillers and laughter in conversational speech

被引:0
作者
Krikke, Teun F. [1 ]
Truong, Khiet P. [1 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
nonverbal vocalizations; laughter; filled pauses; detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we analyze acoustic profiles of fillers (i.e. filled pauses, FPs) and laughter with the aim to automatically localize these nonverbal vocalizations in a stream of audio. Among other features, we use voice quality features to capture the distinctive production modes of laughter and spectral similarity measures to capture the stability of the oral tract that is characteristic for FPs. Classification experiments with Gaussian Mixture Models and various sets of features are performed. We find that Mel-Frequency Cepstrum Coefficients are performing relatively well in comparison to other features for both FPs and laughter. In order to address the large variation in the frame wise decision scores (e.g., log-likelihood ratios) observed in sequences of frames we apply a median filter to these scores, which yields large performance improvements. Our analyses and results are presented within the framework of this year's Interspeech Computational Paralinguistics sub-Challenge on Social Signals.
引用
收藏
页码:163 / 167
页数:5
相关论文
共 19 条
  • [1] [Anonymous], 2013, P INT
  • [2] FORMANT-BASED TECHNIQUE FOR AUTOMATIC FILLED-PAUSE DETECTION IN SPONTANEOUS SPOKEN ENGLISH
    Audhkhasi, Kartik
    Kandhway, Kundan
    Deshmukh, Om D.
    Verma, Ashish
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4857 - +
  • [3] Boersma P., 2013, Praat: doing phonetics by computer, DOI DOI 10.1097/AUD.0B013E31821473F7
  • [4] Clark Herber H., 2005, Using language
  • [5] Dhillon R., 2004, TR04002 ICSI
  • [6] Esling J. H., 2007, P INT WORKSH PHON LA
  • [7] Goto M., 1999, P EUROPEAN C SPEECH, P227
  • [8] PERCEPTUAL AND ACOUSTIC CORRELATES OF ABNORMAL VOICE QUALITIES
    HAMMARBERG, B
    FRITZELL, B
    GAUFFIN, J
    SUNDBERG, J
    WEDIN, L
    [J]. ACTA OTO-LARYNGOLOGICA, 1980, 90 (5-6) : 441 - 451
  • [9] Kennedy L., 2004, NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, P118
  • [10] Knox M.T., 2007, INTERSPEECH, P2973