A multimodal spectral approach to characterize rhythm in natural speech

被引:13
作者
Alexandrou, Anna Maria [1 ]
Saarinen, Timo [1 ]
Kujala, Jan [1 ]
Salmelin, Riitta [1 ]
机构
[1] Aalto Univ, Dept Neurosci & Biomed Engn, FI-00076 Aalto, Finland
基金
芬兰科学院;
关键词
HABITUAL SPEAKING RATE; INTERSPEAKER VARIATION; LINGUISTIC RHYTHM; PERCEPTION; LANGUAGE; ORGANIZATION; EVOLUTION; COHERENCE; CIRCUITS; PATTERNS;
D O I
10.1121/1.4939496
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech. (C) 2016 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License.
引用
收藏
页码:215 / 226
页数:12
相关论文
共 83 条
[41]   Rhythm in Speech and Language A New Research Paradigm [J].
Kohler, Klaus J. .
PHONETICA, 2009, 66 (1-2) :29-45
[42]   Models of word production [J].
Levelt, WJM .
TRENDS IN COGNITIVE SCIENCES, 1999, 3 (06) :223-232
[43]  
Lindblom Bjorn, 1983, PRODUCTION SPEECH, P217, DOI DOI 10.1007/978-1-4613-8202-7_10
[44]   Altered effective connectivity and anomalous anatomy in the basal ganglia-thalamocortical circuit of stuttering speakers [J].
Lu, Chunming ;
Peng, Danling ;
Chen, Chuansheng ;
Ning, Ning ;
Ding, Guosheng ;
Li, Kuncheng ;
Yang, Yanhui ;
Lin, Chunlan .
CORTEX, 2010, 46 (01) :49-67
[45]   Marching to the beat of the same drummer: the spontaneous tempo of human locomotion [J].
MacDougall, HG ;
Moore, ST .
JOURNAL OF APPLIED PHYSIOLOGY, 2005, 99 (03) :1164-1173
[46]   The frame/content theory of evolution of speech production [J].
MacNeilage, PF .
BEHAVIORAL AND BRAIN SCIENCES, 1998, 21 (04) :499-+
[47]   ACOUSTIC DETERMINANTS OF PERCEPTUAL CENTER (P-CENTER) LOCATION [J].
MARCUS, SM .
PERCEPTION & PSYCHOPHYSICS, 1981, 30 (03) :247-256
[48]   RHYTHMIC (HIERARCHICAL) VERSUS SERIAL STRUCTURE IN SPEECH AND OTHER BEHAVIOR [J].
MARTIN, JG .
PSYCHOLOGICAL REVIEW, 1972, 79 (06) :487-&
[49]  
MEIRELES A. R., 2008, SPEECH PROS 2008 C 2, V1, p327
[50]  
Meireles AR, 2012, PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, P474