Automatic Twitter Topic Summarization With Speech Acts

被引:37
作者
Zhang, Renxian [1 ,2 ]
Li, Wenjie [2 ]
Gao, Dehong [1 ,2 ]
Ouyang, You
机构
[1] Hong Kong Polytech Univ, Shenzhen Res Inst, Innovat Intelligent Comp Ctr, Shenzhen 518055, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 03期
关键词
Twitter; speech act; abstractive summarization; key word/phrase extraction;
D O I
10.1109/TASL.2012.2229984
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the growth of the social media service of Twitter, automatic summarization of Twitter messages (tweets) is in urgent need for efficient processing of the massive tweeted information. Unlike multi-document summarization in general, Twitter topic summarization must handle the numerous, short, dissimilar, and noisy nature of tweets. To address this challenge, we propose a novel speech act-guided summarization approach in this work. Speech acts characterize tweeters' communicative behavior and provide an organized view of their messages. Speech act recognition is a multi-class classification problem, which we solve by using word-based and symbol-based features that capture both the linguistic features of speech acts and the particularities of Twitter text. The recognized speech acts in tweets are then used to direct the extraction of key words and phrases to fill in templates designed for speech acts. Leveraging high-ranking words and phrases as well as topic information for major speech acts, we propose a round-robin algorithm to generate template-based summaries. Different from the extractive method adopted in most previous works, our summarization method is abstractive. Evaluated on two 100-topic datasets, the summaries generated by our method outperform two kinds of representative extractive summaries and rival human-written summaries in terms of explanatoriness and informativeness.
引用
收藏
页码:649 / 658
页数:10
相关论文
共 43 条
[1]  
[Anonymous], 2008, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, DOI DOI 10.1145/1390334.1390386
[2]  
[Anonymous], 2010, P ICON 2010 8 INT C
[3]  
[Anonymous], 2008, P 5 INT NATURAL LANG
[4]  
[Anonymous], 2011, Proceedings of ACL-HLT
[5]  
[Anonymous], 1997, SWITCHBOARD SWBD DAM
[6]  
Austin J. L., 1962, DO THINGS WORDS, DOI DOI 10.1093/ACPROF:OSO/9780198245537.001.0001
[7]  
Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
[8]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[9]  
Bunt H., 1994, Think Q., V3, P19
[10]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025