Recognizing disfluencies in conversational speech

被引:21
作者
Lease, Matthew [1 ]
Johnson, Mark
Charniak, Eugene
机构
[1] Brown Univ, BLLIP, Dept Comp Sci, Providence, RI 02912 USA
[2] Brown Univ, BLLIP, Dept Cognit & Linguist Sci, Providence, RI 02912 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期
基金
美国国家科学基金会;
关键词
disfluency modeling; natural language processing; rich transcription; speech processing;
D O I
10.1109/TASL.2006.878269
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IN are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.
引用
收藏
页码:1566 / 1573
页数:8
相关论文
共 27 条
[1]  
[Anonymous], TR1098 HARV U CTR RE
[2]  
BIES A, 1995, BRACKETTING GUIDELIN
[3]  
Charniak E, 2001, 2ND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P118
[4]  
Charniak E, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P116
[5]  
Charniak E, 2005, P 43 ANN M ASS COMP, P173, DOI DOI 10.3115/1219840.1219862
[6]  
CHARNIAK E, MACH TRANSL SUMM
[7]  
Engel D, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P49
[8]  
Graff David, 2000, P 2 INT C LANG RES E, P427
[9]   Language modeling using efficient best first bottom-up parsing [J].
Hall, K ;
Johnson, M .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :507-512
[10]  
Heeman PA, 1999, COMPUT LINGUIST, V25, P527