Query-based summarization of discussion threads

被引:6
|
作者
Verberne, Suzan [1 ]
Krahmer, Emiel [2 ]
Wubben, Sander [2 ]
van den Bosch, Antal [3 ,4 ]
机构
[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
[2] Tilburg Univ, Tilburg Sch Humanities, Tilburg, Netherlands
[3] Radboud Univ Nijmegen, Ctr Language Studies, Nijmegen, Netherlands
[4] Meertens Inst, Amsterdam, Netherlands
关键词
query-based summarization; discussion forums; reference summaries; word embeddings; evaluation; AGREEMENT; NETWORKS; DOCUMENT;
D O I
10.1017/S1351324919000123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum's search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread-query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.
引用
收藏
页码:3 / 29
页数:27
相关论文
共 50 条
  • [31] Deep Dependency Substructure-Based Learning for Multidocument Summarization
    Yan, Su
    Wan, Xiaojun
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2015, 34 (01)
  • [32] HASumRuNNer: An Extractive Text Summarization Optimization Model Based on a Gradient-Based Algorithm
    Muljono
    Nababan, Mangatur Rudolf
    Nugroho, Raden Arief
    Djajadinata, Kevin
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (04) : 656 - 667
  • [33] An efficient query processing optimization based on ELM in the cloud
    Ding, Linlin
    Xin, Junchang
    Wang, Guoren
    NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 35 - 44
  • [34] Clustering cliques for graph-based summarization of the biomedical research literature
    Zhang, Han
    Fiszman, Marcelo
    Shin, Dongwook
    Wilkowski, Bartlomiej
    Rindflesch, Thomas C.
    BMC BIOINFORMATICS, 2013, 14
  • [35] Secure Data Sequence Query Framework Based on Multiple Fogs
    Gu, Ke
    Wu, Na
    Yin, Bo
    Jia, Weijia
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (04) : 1883 - 1900
  • [36] Source code fragment summarization with small-scale crowdsourcing based features
    Nazar, Najam
    Jiang, He
    Gao, Guojun
    Zhang, Tao
    Li, Xiaochen
    Ren, Zhilei
    FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (03) : 504 - 517
  • [37] Webpage retrieval based on query by example for think tank construction
    Geng, Qian
    Chuai, Ziang
    Jin, Jian
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
  • [38] An Opinion Summarization-Evaluation System Based on Pre-trained Models
    Jiang, Han
    Wang, Yubin
    Lv, Songhao
    Wei, Zhihua
    ROUGH SETS (IJCRS 2021), 2021, 12872 : 225 - 230
  • [39] Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach
    Jain, Deepali
    Borah, Malaya Dutta
    Biswas, Anupam
    PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 41 - 48
  • [40] Distributed aggregation-based attributed graph summarization for summary-based approximate attributed graph queries
    Yang, Shang
    Yang, Zhipeng
    Chen, Xiaona
    Zhao, Jingpeng
    Ma, Yinglong
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176