NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter

被引:48
作者
Bellaachia, Abdelghani [1 ]
Al-Dhelaan, Mohammed [1 ]
机构
[1] George Washington Univ, Comp Sci Dept, Washington, DC 20052 USA
来源
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1 | 2012年
关键词
Keyphrase Extraction; Graph-based Ranking; Hashtag; Twitter; PageRank; TextRank; NE-Rank;
D O I
10.1109/WI-IAT.2012.82
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The massive growth of the micro-blogging service Twitter has shed the light on the challenging problem of summarizing a collection of large number of tweets. This paper attempts to extract topical keyphrases that would represent topics in tweets. Due to the informality, noise, and short length of tweets, such research is nontrivial. We tackle such challenges with extensive preprocessing approach. Followed by, introduction of new features that improve topical keyphrase extraction in Twitter. We start by proposing a novel unsupervised graph-based keyword ranking method, called NE-Rank, that considers word weights in addition to edge weights when calculating the ranking. Then we introduce a new approach of leveraging hashtags when extracting keyphrases. We have conducted a set of experiments showing the potential of both approaches with 16% to 39% improvement for NE-Rank and 20% improvement for hashtag enhanced extraction.
引用
收藏
页码:372 / 379
页数:8
相关论文
共 23 条
  • [1] [Anonymous], 2011, P 20 ACM INT C INF K, DOI 10.1145/2063576.2063726
  • [2] [Anonymous], 2010, P 3 ACM INT C WEB SE, DOI DOI 10.1145/1718487.1718520
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] The anatomy of a large-scale hypertextual Web search engine
    Brin, S
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 107 - 117
  • [5] Buckley C., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P25, DOI 10.1145/1008992.1009000
  • [6] LexRank: Graph-based lexical centrality as salience in text summarization
    Erkan, G
    Radev, DR
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 457 - 479
  • [7] Frank E, 1999, IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, P668
  • [8] Hasan K. S., 2010, P 23 INT C COMP LING, P365
  • [9] Hu Y., ICWSM 12
  • [10] Hulth A, 2003, PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P216