Stochastic natural language generation for spoken dialog systems

被引:23
作者
Oh, AH [1 ]
Rudnicky, AI [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
We would like to thank the members of the Carnegie Mellon Speech Group and in particular the contributors to the Communicator project; without whom this work would not have been possible to carry out. We would also like to acknowledge the contribution of Lea of People's Travel in Pittsburgh; PA; US; who helped us create the necessary corpus of travel agent / client interactions. Portions of this work were first described at the April 2000 NAACL Workshop on Dialog Processing and; earlier; in presentations at principal investigator meetings of the DARPA Communicator Program. This research was sponsored in part by the Space and Naval Warfare Systems Center; San Diego; under Grant No. N66001-99-1-8905. The content of the information in this publication does not necessarily reflect the position or the policy of the US Government; and no official endorsement should be inferred;
D O I
10.1016/S0885-2308(02)00012-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a corpus-based approach to natural language generation (NLG). The approach has been implemented as a component of a spoken dialog system and a series of evaluations were carried out. Our system uses n-gram language models, which have been found useful in other language technology applications, in a generative mode. It is not yet clear whether the simple n-grams can adequately model human language generation in general, but we show that we can successfully apply this ubiquitous modeling technique to the task of natural language generation for spoken dialog systems. In this paper, we discuss applying corpus-based stochastic language generation at two levels: content selection and sentence planning/realization. At the content selection level, output utterances are modeled by bigrams, and the appropriate attributes are chosen using bigram statistics. In sentence planning and realization, corpus utterances are modeled by n-grams of varying length, and new utterances are generated stochastically. Through this work, we show that a simple statistical model alone can generate appropriate language for a spoken dialog system. The results describe a promising avenue for using a statistical approach in future NLG systems. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:387 / 407
页数:21
相关论文
共 30 条
[1]  
[Anonymous], P DARPA SPEECH NAT L
[2]  
[Anonymous], 1995, PROC 5 EUROPEAN WORK
[3]  
[Anonymous], 2000, ANLPNAACL 2000WORKSH
[4]  
[Anonymous], 1998, P 9 INT WORKSHOP NAT
[5]  
AXELROD S, 2000, P ANLP NAACL 2000 WO, P21
[6]  
BANGALORE S, 2000, P INT C NAT LANG GEN
[7]  
BAPTIST L, 2000, P INT C SPOK LANG PR
[8]  
BATEMAN J, 1999, P KI 99 WORKSH MAY I
[9]  
BOYCE S, 1996, P INT S SPOK DIAL IS, P65
[10]  
CLARKSON P, 1997, P EUR 97