Recognizing disfluencies in conversational speech

被引：21

作者：

Lease, Matthew ^{[1
]}

Johnson, Mark

Charniak, Eugene

机构：

[1] Brown Univ, BLLIP, Dept Comp Sci, Providence, RI 02912 USA

[2] Brown Univ, BLLIP, Dept Cognit & Linguist Sci, Providence, RI 02912 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期

基金：

美国国家科学基金会;

关键词：

disfluency modeling; natural language processing; rich transcription; speech processing;

D O I：

10.1109/TASL.2006.878269

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IN are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.

引用

页码：1566 / 1573

页数：8

共 27 条

[1]

[Anonymous], TR1098 HARV U CTR RE

[2]

BIES A, 1995, BRACKETTING GUIDELIN

[3]

Charniak E, 2001, 2ND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P118

[4]

Charniak E, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P116

[5]

Charniak E, 2005, P 43 ANN M ASS COMP, P173, DOI DOI 10.3115/1219840.1219862

[6]

CHARNIAK E, MACH TRANSL SUMM

[7]

Engel D, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P49

[8]

Graff David, 2000, P 2 INT C LANG RES E, P427

[9] Language modeling using efficient best first bottom-up parsing [J].

Hall, K ;

Johnson, M .

ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :507-512

[10]

Heeman PA, 1999, COMPUT LINGUIST, V25, P527

← 1 2 3 →