Mining the interests of Chinese microbloggers via keyword extraction

被引:64
作者
Liu, Zhiyuan [1 ]
Chen, Xinxiong [1 ]
Sun, Maosong [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
microblogging; Sina Weibo; Chinese keyword extraction; user interests;
D O I
10.1007/s11704-011-1174-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microblogging provides a new platform for communicating and sharing information among Web users. Users can express opinions and record daily life using microblogs. Microblogs that are posted by users indicate their interests to some extent. We aim to mine user interests via keyword extraction from microblogs. Traditional keyword extraction methods are usually designed for formal documents such as news articles or scientific papers. Messages posted by microblogging users, however, are usually noisy and full of new words, which is a challenge for keyword extraction. In this paper, we combine a translation-based method with a frequency-based method for keyword extraction. In our experiments, we extract keywords for microblog users from the largest microblogging website in China, Sina Weibo. The results show that our method can identify users' interests accurately and efficiently.
引用
收藏
页码:76 / 87
页数:12
相关论文
共 63 条
  • [1] [Anonymous], STACKED MODEL BASED
  • [2] [Anonymous], 2010, HT 10 P 21 ACM C HYP
  • [3] [Anonymous], 2011, 49 ANN M ASS COMP LI
  • [4] [Anonymous], 2010, P 7 INT C LANG RES E
  • [5] [Anonymous], 2010, HLT 10
  • [6] [Anonymous], 2004, P 2004 C EMPIRICAL M
  • [7] [Anonymous], 2011, Proceedings of ACL-HLT
  • [8] [Anonymous], 2008, P 31 ANN INT ACM SIG
  • [9] [Anonymous], 2010, Statistical Machine Translation
  • [10] [Anonymous], P WEB SCI C