CQASUMM: Building References for Community Question Answering Summarization Corpora

被引:10
作者
Chowdhury, Tanya [1 ,2 ]
Chakraborty, Tanmoy [2 ]
机构
[1] Myntra Designs, Bengaluru, India
[2] IIIT Delhi, New Delhi, India
来源
PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD | 2019年
关键词
Community Question Answering; Multi Document Summarization; Summarization Corpus; Yahoo! Answers; TEXT;
D O I
10.1145/3297001.3297004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Answers submitted to CQA forums are often elaborate, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task. However, there is a dearth of neural approaches for CQA summarization due to the lack of large scale annotated dataset. We create CQASUMM, the first annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference and build hundred word summaries from them. We provide scripts(1) to reconstruct the dataset and introduce the new task of Community Question Answering Summarization. Multi document summarization(MDS) has been widely studied using news corpora. However documents in CQA have higher variance and contradicting opinion. We compare the popular MDS techniques and evaluate their performance on our CQA corpora. We find that most MDS workflows are built for the entirely factual news corpora, whereas our corpus has a fair share of opinion based instances too. We therefore introduce OpinioSumm, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score.
引用
收藏
页码:18 / 26
页数:9
相关论文
共 29 条
[1]  
[Anonymous], 1999, WWW 1999
[2]  
[Anonymous], 2010, P 19 ACM INT C INF K
[3]  
[Anonymous], 2010, LREC 10
[4]  
Benikova Darina, 2016, P 26 INT C COMP LING, P1039
[5]   Swarm Based Text Summarization [J].
Binwahlan, Mohammed Salem ;
Salim, Naomie ;
Suanmali, Ladda .
IACSIT-SC 2009: INTERNATIONAL ASSOCIATION OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY - SPRING CONFERENCE, 2009, :145-+
[6]  
Bossard A., 2011, Combinations of Intelligent Methods and Applications, V8, P71
[7]  
Broscheit S., 2010, P 5 INT WORKSHOP SEM, P104
[8]  
Cao Yunbo, 2011, US Patent, Patent No. 7966316
[9]  
Cao ZQ, 2016, AAAI CONF ARTIF INTE, P2906
[10]  
Chowdhury R, 2018, GREEN ENERGY TECHNOL, P231, DOI 10.1007/978-981-10-7188-1_10