Unsupervised Topic-Oriented Keyphrase Extraction and Its Application to Croatian

被引:0
作者
Saratlija, Josip [1 ]
Snajder, Jan [1 ]
Basic, Bojana Dalbelo [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Zagreb 41000, Croatia
来源
TEXT, SPEECH AND DIALOGUE, TSD 2011 | 2011年 / 6836卷
关键词
Information extraction; keyphrase extraction; unsupervised learning; k-means; Croatian language;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1 = 44.5%, which is outperformed by human annotators, but comparable to a supervised approach.
引用
收藏
页码:340 / 347
页数:8
相关论文
共 20 条
[1]  
Ahel R., 2009, THE FUTURE OF INFORM, P207
[2]  
Arthur D., 2007, P 18 ANN ACM SIAM S, DOI DOI 10.1145/1283383.1283494
[3]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[4]  
Delip R., 2002, FLAIRS CONFERENCE, P321
[5]   Using lexical chains for keyword extraction [J].
Ercan, Gonenc ;
Cicekli, Ilyas .
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (06) :1705-1714
[6]  
Frank E, 1999, IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, P668
[7]  
Gulla JA, 2006, LECT NOTES COMPUT SC, V3999, P25
[8]  
Hasan K. S., 2010, P 23 INT C COMP LING, P365
[9]  
Joachims T., EUR C MACH LEARN, P137, DOI DOI 10.1007/BFB0026683
[10]  
Li D., 2010, P ACL 2010 C SHORT P, P296