Perceptual Information Loss due to Impaired Speech Production

被引:14
作者
Asaei, Afsaneh [1 ,2 ]
Cernak, Milos [1 ]
Bourlard, Herve [1 ,3 ]
机构
[1] Ctr Parc, Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Tech Univ Munich, UnternehmerTUM, Ctr Innovat & Business Creat, D-80333 Munich, Germany
[3] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Information transmission; motor speech disorders; speech production; speech perception; CORTICAL ORGANIZATION; RECOGNITION; DEEP;
D O I
10.1109/TASLP.2017.2738445
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Phonological classes define articulatory-free and articulatory-bound phone attributes. Deep neural network is used to estimate the probability of phonological classes from the speech signal. In theory, a unique combination of phone attributes form a phoneme identity. Probabilistic inference of phonological classes thus enables estimation of their compositional phoneme probabilities. A novel information theoretic framework is devised to quantify the information conveyed by each phone attribute, and assess the speech production quality for perception of phonemes. As a use case, we hypothesize that disruption in speech production leads to information loss in phone attributes, and thus confusion in phoneme identification. We quantify the amount of information loss due to dysarthric articulation recorded in the TORGO database. A novel information measure is formulated to evaluate the deviation from an ideal phone attribute production leading us to distinguish healthy production from pathological speech.
引用
收藏
页码:2433 / 2443
页数:11
相关论文
共 40 条
[31]   AN ANALYSIS OF PERCEPTUAL CONFUSIONS AMONG SOME ENGLISH CONSONANTS [J].
MILLER, GA ;
NICELY, PE .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1955, 27 (02) :338-352
[32]   Information transmission analysis for continuous speech features [J].
Oosthuizen, Dirk J. J. ;
Hanekom, Johan J. .
SPEECH COMMUNICATION, 2016, 82 :53-66
[33]  
PAUL DB, 1992, SPEECH AND NATURAL LANGUAGE, P357
[34]   The TORGO database of acoustic and articulatory speech from speakers with dysarthria [J].
Rudzicz, Frank ;
Namasivayam, Aravind Kumar ;
Wolff, Talya .
LANGUAGE RESOURCES AND EVALUATION, 2012, 46 (04) :523-541
[35]   A MATHEMATICAL THEORY OF COMMUNICATION [J].
SHANNON, CE .
BELL SYSTEM TECHNICAL JOURNAL, 1948, 27 (04) :623-656
[36]   The relative importance of spectral cues for vowel recognition in severe noise [J].
Swanepoel, Rikus ;
Oosthuizen, Dirk J. J. ;
Hanekom, Johan J. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04) :2652-2662
[37]   Synergy, redundancy, and multivariate information measures: an experimentalist's perspective [J].
Timme, Nicholas ;
Alford, Wesley ;
Flecker, Benjamin ;
Beggs, John M. .
JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2014, 36 (02) :119-140
[38]  
Weide R. L., 1998, CMU PRONOUNCING DICT
[39]   BINAURAL SPEECH SEGREGATION BASED ON PITCH AND AZIMUTH TRACKING [J].
Woodruff, John ;
Wang, DeLiang .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :241-244
[40]  
Zen H., 2007, P ISCA SSW6, P131