Automatic summarization of open-domain multiparty dialogues in diverse genres

被引:57
|
作者
Zechner, K [1 ]
机构
[1] Educ Testing Serv, Princeton, NJ 08541 USA
关键词
D O I
10.1162/089120102762671945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres, without any restriction on domain. We address the following issues, which are intrinsic to spoken-dialogue summarization and typically can be ignored when summarizing written text such as news wire data: (1) detection and removal of speech disfluencies; (2) detection and insertion of sentence boundaries; and (3) detection and linking of cross-speaker information units (question-answer pairs). A system evaluation is performed using a corpus of 23 dialogue excerpts with an average duration of about 10 minutes, comprising 80 topical segments and about 47,000 words total. The corpus was manually annotated for relevant text spans by six human annotators. The global evaluation shows that for the two more informal genres, our summarization system using dialogue-specific components significantly outperforms two baselines: (1) a maximum-marginal-relevance ranking algorithm using TF*IDF term weighting, and (2) a LEAD baseline that extracts the first n words from a text.
引用
收藏
页码:447 / 485
页数:39
相关论文
共 50 条
  • [31] Challenges to Open-Domain Constituency Parsing
    Yang, Sen
    Cui, Leyang
    Ning, Ruoxi
    Wu, Di
    Zhang, Yue
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 112 - 127
  • [32] On Monotonic Aggregation for Open-domain QA
    Han, Sang-eun
    Jeong, Yeonseok
    Hwang, Seung-won
    Lee, Kyungjae
    INTERSPEECH 2023, 2023, : 3432 - 3436
  • [33] Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables
    Sun, Bin
    Li, Yitong
    Mi, Fei
    Wang, Weichao
    Li, Yiwei
    Li, Kan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13600 - 13608
  • [34] Towards an Open-Domain Dialog System
    Gao, Jianfeng
    PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 1 - 1
  • [35] Entity Resolution in Open-domain Conversations
    Shang, Mingyue
    Wang, Tong
    Eric, Mihail
    Chen, Jiangning
    Wang, Jiyang
    Welch, Matthew
    Deng, Tiantong
    Grewal, Akshay
    Wang, Han
    Liu, Yue
    Kiss, Imre
    Liu, Yang
    Hakkani-Tur, Dilek
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 26 - 33
  • [36] Type checking in open-domain question answering
    Schlobach, S
    Olsthoorn, M
    de Rijke, M
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 398 - 402
  • [37] GRADE Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
    Huang, Lishan
    Ye, Zheng
    Qin, Jinghui
    Lin, Liang
    Liang, Xiaodan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9230 - 9240
  • [38] Open-domain Document-based Automatic QA Models Based on CNN and Attention Mechanism
    Zhang, Guangjie
    Fant, Xumin
    Jin, Canghong
    Wu, Minghui
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 326 - 332
  • [39] Detrimental Contexts in Open-Domain Question Answering
    Oh, Philhoon
    Thorne, James
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11589 - 11605
  • [40] AMBIGQA: Answering Ambiguous Open-domain Questions
    Min, Sewon
    Michael, Julian
    Hajishirzi, Hannaneh
    Zettlemoyer, Luke
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5783 - 5797