An automatic keyphrase extraction system for scientific documents

被引:0
作者
Wei You
Dominique Fontaine
Jean-Paul Barthès
机构
[1] Université de Technologiede Compiègne,HEUDIASYC UMR CNRS 6599
[2] Centre de Recherches de Royallieu,undefined
来源
Knowledge and Information Systems | 2013年 / 34卷
关键词
Information retrieval; Automatic indexing; Keyphrases extraction; Candidate phrase identification; Scientific document processing;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.
引用
收藏
页码:691 / 724
页数:33
相关论文
共 23 条
  • [1] Brill E(1995)Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging Comput. Linguistics 21 543-566
  • [2] El-Beltagy SR(2009)KP-Miner: a keyphrase extraction system for english and arabic documents Inf Syst 34 132-144
  • [3] Rafea A(2007)Multi-agent based internet search Int J Prod Lifecycle Manage 2 135-156
  • [4] Enembreck F(2008)Improving keyword based web image search with visual feature distribution and term expansion Knowl Inf Syst 21 113-132
  • [5] Barthès J-P(2005)Semi-supervised learning from different information sources Knowl Inf Syst 7 289-309
  • [6] Gong ZG(2004)Automatic acquisition and expansion of hypernym links Comput Humanities 38 363-369
  • [7] Liu Q(2011)Keyword search in relational databases Knowl Inf Syst 26 175-193
  • [8] Li T(1980)An algorithm for suffix stripping Program 14 130-137
  • [9] Ogihara M(1996)Optimal document-indexing vocabulary for MEDLINE Inf Process Manage 32 503-514
  • [10] Morin E(2000)Learning algorithms for keyphrase extraction Inf Retrieval 2 303-336