Keyphrase extraction from Chinese news web pages based on semantic relations

被引:0
作者
Xie, Fei [1 ,4 ]
Wu, Xindong [1 ,2 ]
Hu, Xue-Gang [1 ]
Wang, Fei-Yue [3 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 50405 USA
[3] Chinese Acad Sci, Inst Automat, Beijing 100864, Peoples R China
[4] Hefei Teachers Coll, Dept Comp Sci & Technol, Hefei 230061, Peoples R China
来源
INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS | 2008年 / 5075卷
关键词
keyphrase extraction; semantic relation; word similarity; word co-occurrence; lexical chain;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrases are very useful for saving time on browsing through the news web pages. A new keyphrase extraction method from Chinese news web pages based on semantic relations is presented in this paper. Semantic relations between phrases are analyzed, and a lexical chain is used to construct a semantic relation graph. Keyphrases are extracted and a semantic link graph is built on the lexical chains. News web pages with core hints are selected from www.163.com to test our method. The experimental results show that the proposed method substantially outperforms the method based on term frequency, especially when the number of keyphrases extracted is 3 - the precision is improved by 26.97 percent, and the recall is improved by 20.93 percent.
引用
收藏
页码:490 / +
页数:2
相关论文
共 15 条
  • [1] Barzilay R., 1997, Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, P10
  • [2] Chun D., 2004, ACTA EDITOLOGICA, V16, P105
  • [3] Frank E, 1999, P 4 ACM C DIG LIB, P254
  • [4] Halliday M. A. K., 1989, Cohesion in English
  • [5] HONGGUANG S, 2006, J CHINESE INFORM PRO, V20, P25
  • [6] Li Su-Jian, 2004, Chinese Journal of Computers, V27, P1192
  • [7] Mihalcea R., 2004, P 42 ANN M ASS COMP
  • [8] Morris J., 1991, Computational Linguistics, V17, P21
  • [9] PEAT HJ, 1991, J AM SOC INFORM SCI, V42, P378, DOI 10.1002/(SICI)1097-4571(199106)42:5<378::AID-ASI8>3.0.CO
  • [10] 2-8