Query-based summarization of discussion threads

被引:6
|
作者
Verberne, Suzan [1 ]
Krahmer, Emiel [2 ]
Wubben, Sander [2 ]
van den Bosch, Antal [3 ,4 ]
机构
[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
[2] Tilburg Univ, Tilburg Sch Humanities, Tilburg, Netherlands
[3] Radboud Univ Nijmegen, Ctr Language Studies, Nijmegen, Netherlands
[4] Meertens Inst, Amsterdam, Netherlands
关键词
query-based summarization; discussion forums; reference summaries; word embeddings; evaluation; AGREEMENT; NETWORKS; DOCUMENT;
D O I
10.1017/S1351324919000123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum's search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread-query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.
引用
收藏
页码:3 / 29
页数:27
相关论文
共 50 条
  • [41] Text summarization using topic-based vector space model and semantic measure
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [42] A new sentence similarity measure and sentence based extractive technique for automatic text summarization
    Aliguliyev, Ramiz M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7764 - 7772
  • [43] Query Expansion based on Word Embeddings and Ontologies for Efficient Information Retrieval
    Rastogi, Namrata
    Verma, Parul
    Kumar, Pankaj
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 367 - 373
  • [44] A new graph-based extractive text summarization using keywords or topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) : 8975 - 8990
  • [45] Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications
    Verma, Jai Prakash
    Bhargav, Shir
    Bhavsar, Madhuri
    Bhattacharya, Pronaya
    Bostani, Ali
    Chowdhury, Subrata
    Webber, Julian
    Mehbodniya, Abolfazl
    INFORMATION, 2023, 14 (09)
  • [46] Dilated convolution for enhanced extractive summarization: A GAN-based approach with BERT word embedding
    Wu, Huimin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 4777 - 4790
  • [47] Market-Based Adaptive Discussion Forums
    Lopez, Natalia
    Nunez, Manuel
    Rabanal, Pablo
    Rodriguez, Ismael
    Rubio, Fernando
    ADVANCED INTERNET BASED SYSTEMS AND APPLICATIONS, 2009, 4879 : 149 - 160
  • [48] Integrating Ontology-Based Knowledge to Improve Biomedical Multi-Document Summarization Model
    Quoc-An Nguyen
    Khanh-Vinh Nguyen
    Hoang Quynh Le
    Duy-Cat Can
    Tam Doan-Thanh
    Trung-Hieu Do
    Mai-Vu Tran
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 99 - 110
  • [49] SMBSRP: A Search Mechanism Based on Interest Similarity, Query Relevance and Distance Prediction
    Wang, Fen
    Xie, Changsheng
    Liang, Hong
    Huang, Xiaotao
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I, 2013, 7902 : 633 - 646
  • [50] Evaluation of a flowchart-based EHR query system: A case study of RetroGuide
    Huser, Vojtech
    Narus, Scott P.
    Rocha, Roberto A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (01) : 41 - 50