YAKE! Collection-Independent Automatic Keyword Extractor

被引:124
作者
Campos, Ricardo [1 ,2 ]
Mangaravite, Vitor [2 ]
Pasquali, Arian [2 ]
Jorge, Alipio Mario [2 ,3 ]
Nunes, Celia [4 ]
Jatowt, Adam [5 ]
机构
[1] Polytech Inst Tomar, Tomar, Portugal
[2] LIAAD INESC TEC, Porto, Portugal
[3] Univ Porto, DCC FCUP, Porto, Portugal
[4] Univ Beira Interior, Covilha, Portugal
[5] Kyoto Univ, Kyoto, Japan
来源
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018) | 2018年 / 10772卷
关键词
Keyword extraction; Information extraction; Text mining;
D O I
10.1007/978-3-319-76941-7_80
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present YAKE!, a novel feature-based system for multi-lingual keyword extraction from single documents, which supports texts of different sizes, domains or languages. Unlike most systems, YAKE! does not rely on dictionaries or thesauri, neither it is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in many different languages without the need for external knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding (IBM NLU) and Rake system. YAKE! demo is available at http://bit.ly/YakeDemoECIR2018. A python implementation of YAKE! is also available at PyPi repository (https://pypi.python.org/pypi/yake/).
引用
收藏
页码:806 / 810
页数:5
相关论文
共 7 条
[1]  
[Anonymous], 1966, Soviet Physics Doklady
[2]  
[Anonymous], 2004, P 2004 C EMP METH NA
[3]   YAKE! Collection-Independent Automatic Keyword Extractor [J].
Campos, Ricardo ;
Mangaravite, Vitor ;
Pasquali, Arian ;
Jorge, Alipio Mario ;
Nunes, Celia ;
Jatowt, Adam .
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 :806-810
[4]  
Rose S., 2010, TEXT MINING THEORY A
[5]   Learning algorithms for keyphrase extraction [J].
Turney P.D. .
Information Retrieval, 2000, 2 (4) :303-336
[6]  
Wan Xiaojun, 2008, P 23 NAT C ART INT, P855
[7]  
Witten I. H., 1999, Digital 99 Libraries. Fourth ACM Conference on Digital Libraries, P254, DOI 10.1145/313238.313437