Automatic Keyphrase Extraction based on NLP and Statistical Mathods

被引:0
作者
Dostal, Martin [1 ]
Jezek, Karel [1 ]
机构
[1] Univ West Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen, Czech Republic
来源
DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS | 2011年 / 706卷
关键词
keyphrase extraction; Wordnet; TextRank; TFIDF; NLP;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we would like to present our experimental approach to automatic keyphrase extraction based on statistical methods and Wordnet-based pattern evaluation. Automatic keyphrases are important for automatic tagging and clustering because manually assigned keyphrases are not sufficient in most cases. Keyphrase candidates are extracted in a new way derived from a combination of graph methods (TextRank) and statistical methods (TF*IDF). Keyword candidates are merged with named entities and stop words according to NL POS (Part Of a Speech) patterns. Automatic keyphrases are generated as TF*IDF weighted unigrams. Keyphrases describe the main ideas of documents in a human-readable way. Evaluation of this approach is presented in articles extracted from News web sites. Each article contains manually assigned topics/categories which are used for keyword evaluation.
引用
收藏
页码:140 / 145
页数:6
相关论文
共 6 条
  • [1] [Anonymous], 1999, WWW 1999
  • [2] Improving browsing in digital libraries with keyphrase indexes
    Gutwin, C
    Paynter, G
    Witten, I
    Nevill-Manning, C
    Frank, E
    [J]. DECISION SUPPORT SYSTEMS, 1999, 27 (1-2) : 81 - 104
  • [3] HULTH A., 2004, Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction
  • [4] Jones S., J AM SOC INFORM SCI
  • [5] Kogan Jacob., 2010, TEXT MINING APPL THE, P3
  • [6] Mihalcea R., P EMNLP 2004, P404