Keyword Extraction from News Articles Based on PageRank Algorithm

被引:0
作者
Gu Y.-R. [1 ]
Xu M.-X. [1 ]
机构
[1] College of Automation, Nanjing University of Posts and Telecommunications, Nanjing
来源
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China | 2017年 / 46卷 / 05期
关键词
Complex networks; Keyword extraction; Natural language; PageRank; Term-frequency-shared weight;
D O I
10.3969/j.issn.1001-0548.2017.05.021
中图分类号
学科分类号
摘要
Most of the existing methods of extracting keyword based on complex networks ignore the natural language characters when building the weighted text network. In the meantime, they involve less the classical algorithms in complex network field. Based on PageRank algorithm, we propose a keyword extraction method, named LTWPR (located and TF-weighted PageRank), which takes into consideration term-frequency character and human language characters. The algorithm creates a term-frequency-shared weight in order to share the node's term-frequency value to its links, and defines a position weight coefficient to express different importance of words in different positions of news articles. LTWPR brings text networks' local and global features into consideration, making the results more accurate. Comprehensive experiments are conducted based on news articles grabbed from Sina News. Experimental results show that LTWPR algorithm is more effective and can better cover the keywords tagged by authors. © 2017, Editorial Board of Journal of the University of Electronic Science and Technology of China. All right reserved.
引用
收藏
页码:777 / 783
页数:6
相关论文
共 15 条
  • [1] Salton G., Developments in automatic text retrieval, Science, 253, 5023, pp. 974-979, (1991)
  • [2] Yang K.-Y., Research on automatic keyword extraction algorithm based on improved TFIDF, (2015)
  • [3] Guo A., Yang T., Research and improvement of feature words weight based on TFIDF algorithm, Proceedings of the Information Technology, Networking, Electronic and Automation Control Conference (ITNEC 2016), pp. 415-419, (2016)
  • [4] Mihalcea R., Tarau P., TextRank: Bringing order into texts, Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, pp. 404-411, (2004)
  • [5] Brin S., Page L., The anatomy of a large-scale hyper textual web search engine, Proceedings of the 7th World Wide Web Conference (WWW7), pp. 107-117, (1998)
  • [6] Cancho R.F.I., Sole R.V., The small world of human language, Proceedings Biological Sciences, 268, 1482, pp. 2261-2266, (2001)
  • [7] Matsuo Y., Ishizuka M., Keyword extraction from a single document using word co-occurrence statistical information, Transactions of the Japanese Society for Artificial Intelligence, 13, 17, pp. 217-223, (2011)
  • [8] Ren X.-L., Lu L.-Y., Review of ranking nodes in complex networks, Chin Sci Bull, 59, 13, pp. 1175-1197, (2014)
  • [9] Xie F.-H., Zhang D.-W., Huang D., Et al., Keywords extraction based on weighted complex network, Journal of Systems Science and Mathematical Sciences, 30, 11, pp. 1592-1596, (2010)
  • [10] Tang J., Application of complex networks to keyword extraction of news web pages, Journal of Yunnan Nationalities University: Natural Sciences Edition, 21, 4, pp. 305-308, (2012)