Automatic Keyphrase Extraction based on NLP and Statistical Mathods

被引：0

作者：

Dostal, Martin ^{[1
]}

Jezek, Karel ^{[1
]}

机构：

[1] Univ West Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen, Czech Republic

来源：

DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS | 2011年 / 706卷

关键词：

keyphrase extraction; Wordnet; TextRank; TFIDF; NLP;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article we would like to present our experimental approach to automatic keyphrase extraction based on statistical methods and Wordnet-based pattern evaluation. Automatic keyphrases are important for automatic tagging and clustering because manually assigned keyphrases are not sufficient in most cases. Keyphrase candidates are extracted in a new way derived from a combination of graph methods (TextRank) and statistical methods (TF*IDF). Keyword candidates are merged with named entities and stop words according to NL POS (Part Of a Speech) patterns. Automatic keyphrases are generated as TF*IDF weighted unigrams. Keyphrases describe the main ideas of documents in a human-readable way. Evaluation of this approach is presented in articles extracted from News web sites. Each article contains manually assigned topics/categories which are used for keyword evaluation.

引用

页码：140 / 145

页数：6