Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech

被引:9
作者
Heeman, Peter A. [1 ,2 ]
Lunsford, Rebecca [1 ,2 ]
McMillin, Andy [2 ,3 ]
Yaruss, J. Scott [4 ]
机构
[1] BioSpeech, Lake Oswego, OR 97034 USA
[2] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97201 USA
[3] Oregon Hlth & Sci Univ, Speech & Hearing Sci, Portland, OR 97201 USA
[4] Univ Pittsburgh, Dept Commun Sci & Disorders, Pittsburgh, PA USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
stuttering; automatic speech recognition; disfluency counts; user-interface;
D O I
10.21437/Interspeech.2016-1388
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In treating people who stutter, clinicians often have their clients read a story in order to determine their stuttering frequency. As the client is speaking, the clinician annotates each disfluency. For further analysis of the client's speech, it is useful to have a word transcription of what was said. However, as these are real-time annotations, they are not always correct, and they usually lag where the actual disfluency occurred. We have built a tool that rescores a word lattice taking into account the clinician's annotations. In the paper, we describe how we incorporate the clinician's annotations, and the improvement over a baseline version. This approach of leveraging clinician annotations can be used for other clinical tasks where a word transcription is useful for further or richer analysis.
引用
收藏
页码:2651 / 2655
页数:5
相关论文
共 20 条
[1]  
[Anonymous], 2003, P IEEE INT C AC SPEE
[2]  
CAMPBELL J, 1991, ANN CONV AM SPEECH L
[3]  
Conture E.G., 2001, Stuttering: Its nature, diagnosis
[4]  
FRY D B, 1975, Cortex, V11, P355
[5]  
Gregory H.H., 2003, Stuttering therapy: Rationale and procedures
[6]  
Heeman P. A., AUT SPEECH REC UND W
[7]  
Heeman PA, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P1324
[8]  
Heeman PA, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P1081
[9]  
Howell P., 1997, J SPEECH LANGUAGE HE
[10]  
Lunsford R., 2015, P 17 ANN C INT SPEEC