Query-based summarization of discussion threads

被引：6

作者：

Verberne, Suzan ^{[1
]}

Krahmer, Emiel ^{[2
]}

Wubben, Sander ^{[2
]}

van den Bosch, Antal ^{[3
,4
]}

机构：

[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands

[2] Tilburg Univ, Tilburg Sch Humanities, Tilburg, Netherlands

[3] Radboud Univ Nijmegen, Ctr Language Studies, Nijmegen, Netherlands

[4] Meertens Inst, Amsterdam, Netherlands

来源：

NATURAL LANGUAGE ENGINEERING | 2020年 / 26卷 / 01期

关键词：

query-based summarization; discussion forums; reference summaries; word embeddings; evaluation; AGREEMENT; NETWORKS; DOCUMENT;

D O I：

10.1017/S1351324919000123

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum's search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread-query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.

引用

页码：3 / 29

页数：27

共 50 条

[31] Deep Dependency Substructure-Based Learning for Multidocument Summarization
Yan, Su
Wan, Xiaojun
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2015, 34 (01)
[32] HASumRuNNer: An Extractive Text Summarization Optimization Model Based on a Gradient-Based Algorithm
Muljono
Nababan, Mangatur Rudolf
Nugroho, Raden Arief
Djajadinata, Kevin
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (04) : 656 - 667
[33] An efficient query processing optimization based on ELM in the cloud
Ding, Linlin
Xin, Junchang
Wang, Guoren
NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 35 - 44
[34] Clustering cliques for graph-based summarization of the biomedical research literature
Zhang, Han
Fiszman, Marcelo
Shin, Dongwook
Wilkowski, Bartlomiej
Rindflesch, Thomas C.
BMC BIOINFORMATICS, 2013, 14
[35] Secure Data Sequence Query Framework Based on Multiple Fogs
Gu, Ke
Wu, Na
Yin, Bo
Jia, Weijia
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (04) : 1883 - 1900
[36] Source code fragment summarization with small-scale crowdsourcing based features
Nazar, Najam
Jiang, He
Gao, Guojun
Zhang, Tao
Li, Xiaochen
Ren, Zhilei
FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (03) : 504 - 517
[37] Webpage retrieval based on query by example for think tank construction
Geng, Qian
Chuai, Ziang
Jin, Jian
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[38] An Opinion Summarization-Evaluation System Based on Pre-trained Models
Jiang, Han
Wang, Yubin
Lv, Songhao
Wei, Zhihua
ROUGH SETS (IJCRS 2021), 2021, 12872 : 225 - 230
[39] Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach
Jain, Deepali
Borah, Malaya Dutta
Biswas, Anupam
PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 41 - 48
[40] Distributed aggregation-based attributed graph summarization for summary-based approximate attributed graph queries
Yang, Shang
Yang, Zhipeng
Chen, Xiaona
Zhao, Jingpeng
Ma, Yinglong
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176

← 1 2 3 4 5 →