MARES: multitask learning algorithm for Web-scale real-time event summarization

被引:0
作者
Min Yang
Wenting Tu
Qiang Qu
Kai Lei
Xiaojun Chen
Jia Zhu
Ying Shen
机构
[1] Chinese Academy of Sciences,Shenzhen Institutes of Advanced Technology
[2] Shanghai University of Finance and Economics,Department of Computer Science
[3] Peking University,School of Electronics and Computer Engineering
[4] Shenzhen University,School of Computing Science
[5] South China Normal University,School of Computing Science
[6] Peking University Shenzhen Graduate School,School of Electronics and Computer Engineering
来源
World Wide Web | 2019年 / 22卷
关键词
Multitask learning; Real-time event summarization; Relevance prediction; Document filtering;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic real-time summarization of massive document streams on the Web has become an important tool for quickly transforming theoverwhelming documents into a novel, comprehensive and concise overview of an event for users. Significant progresses have been made in static text summarization. However, most previous work does not consider the temporal features of the document streams which are valuable in real-time event summarization. In this paper, we propose a novel M ultitask learning A lgorithm for Web-scale R eal-time E vent S ummarization (MARES), which leverages the benefits of supervised deep neural networks as well as a reinforcement learning algorithm to strengthen the representation learning of documents. Specifically, MARES consists two key components: (i) A relevance prediction classifier, in which a hierarchical LSTM model is used to learn the representations of queries and documents; (ii) A document filtering model learns to maximize the long-term rewards with reinforcement learning algorithm, working on a shared document encoding layer with the relevance prediction component. To verify the effectiveness of the proposed model, extensive experiments are conducted on two real-life document stream datasets: TREC Real-Time Summarization Track data and TREC Temporal Summarization Track data. The experimental results demonstrate that our model can achieve significantly better results than the state-of-the-art baseline methods.
引用
收藏
页码:499 / 515
页数:16
相关论文
共 45 条
[1]  
Erkan G(2004)Lexrank: Graph-based lexical centrality as salience in text summarization J. Artif. Intell. Res. 22 457-479
[2]  
Radev DR(2007)Clustering by passing messages between data points Science 315 972-976
[3]  
Frey BJ(2017)Video captioning with attention-based lstm and semantic consistency IEEE Trans. Multimed. 19 2045-2055
[4]  
Dueck D(2017)Learning in high-dimensional multimedia data: the state of the art Multimed. Syst. 23 303-313
[5]  
Gao L(2016)Active domain adaptation with noisy labels for multimedia analysis World Wide Web 19 199-215
[6]  
Guo Z(2016)Optimized graph learning using partial tags and multiple features for image and video annotation IEEE Trans. Image. Process. 25 4999-5011
[7]  
Zhang H(2018)Self-supervised video hashing with hierarchical binary auto-encoder IEEE Trans. Image. Process. 25 4999-5011
[8]  
Xing X(1992)Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach. Learn. 8 229-256
[9]  
Shen HT(2015)An improved early detection method of type-2 diabetes mellitus using multiple classifier system Inform. Sci. 292 1-14
[10]  
Gao L(2016)Exploiting link structure for Web page genre identification Data. Min. Knowl. Disc. 30 550-575