Automatic annotation of context and speech acts for dialogue corpora

被引:8
|
作者
Georgila, Kallirroi [1 ]
Lemon, Oliver [2 ]
Henderson, James [3 ]
Moore, Johanna D.
机构
[1] Univ So Calif, Inst Creat Technol, Marina Del Rey, CA 90292 USA
[2] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
[3] Univ Geneva, Dept Comp Sci, CH-1227 Carouge, Switzerland
基金
英国工程与自然科学研究理事会; 英国惠康基金;
关键词
D O I
10.1017/S1351324909005105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example. we present and evaluate an automatic annotation system which builds 'Information State Update' (ISU) representations of dialogue context for the COMMUNICATOR (2000 and 2001) corpora of human machine dialogues (2,331 dialoguest. The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available.
引用
收藏
页码:315 / 353
页数:39
相关论文
共 50 条
  • [1] Automatic Annotation of Speech Corpora using Approximate Transcripts
    Manolache, Cristian
    Georgescu, Alexandru-Lucian
    Caranica, Alexandru
    Cucu, Horia
    2020 43RD INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2020, : 386 - 391
  • [2] The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
    Glavatskih, Igor
    Platonova, Tatyana
    Rogozhina, Valeria
    Shirokova, Anna
    Smolina, Anna
    Kotov, Mikhail
    Ovsyannikova, Anna
    Repalov, Sergey
    Zulkarneev, Mikhail
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 438 - 445
  • [3] Progress on automatic annotation of speech corpora using complementary ASR systems
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Burileanu, Corneliu
    2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 571 - 574
  • [4] Automatic Annotation of Speech Corpora using Complementary GMM and DNN Acoustic Models
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2018, : 794 - 797
  • [5] Social annotation for large speech corpora
    Li, H. (hongyan.li@ia.ac.cn), 1600, Tsinghua University (53):
  • [6] DIALOGUE ACTS ANNOTATION TO CONSTRUCT DIALOGUE SYSTEMS FOR CONSULTING
    Ohtake, Kiyonori
    Misu, Teruhisa
    Hori, Chiori
    Kashioka, Hideki
    Nakamura, Satoshi
    SPOKEN DIALOGUE SYSTEMS: TECHNOLOGY AND DESIGN, 2011, : 231 - 254
  • [7] The coding and annotation of multimodal dialogue acts
    Petukhova, Volha
    Bunt, Harry
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1293 - 1300
  • [8] Speech Pauses and Dialogue Acts
    Navarretta, Costanza
    PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS), 2020, : 560 - 565
  • [9] Speech acts in context
    Sbisà, M
    LANGUAGE & COMMUNICATION, 2002, 22 (04) : 421 - 436
  • [10] Automatic Construction of Discourse Corpora for Dialogue Translation
    Wang, Longyue
    Zhang, Xiaojun
    Tu, Zhaopeng
    Way, Andy
    Liu, Qun
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2748 - 2754