A semantic approach for extracting domain taxonomies from text

被引:47
作者
Meijer, Kevin [1 ]
Frasincar, Flavius [1 ]
Hogenboom, Frederik [1 ]
机构
[1] Erasmus Univ, NL-3000 DR Rotterdam, Netherlands
关键词
Taxonomy learning; Word sense disambiguation; Term extraction; Subsumption method; Semantic taxonomy evaluation;
D O I
10.1016/j.dss.2014.03.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a framework for the automatic building of a domain taxonomy from text corpora, called Automatic Taxonomy Construction from Text (ATCT). This framework comprises four steps. First, terms are extracted from a corpus of documents. From these extracted terms the ones that are most relevant for a specific domain are selected using a filtering approach in the second step. Third, the selected terms are disambiguated by means of a word sense disambiguation technique and concepts are generated. In the final step, the broadernarrower relations between concepts are determined using a subsumption technique that makes use of concept co-occurrences in a text For evaluation, we assess the performance of the ATCT framework using the semantic precision, semantic recall, and the taxonomic F-measure that take into account the concept semantics. The proposed framework is evaluated in the field of economics and management as well as the medical domain. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:78 / 93
页数:16
相关论文
共 46 条
[1]  
[Anonymous], 2002, Proceedings of the 19th International Conference on Computational Linguistics, DOI [10.3115/1072228.1072318, DOI 10.3115/1072228.1072318]
[2]  
[Anonymous], 1998, Computational Linguistics
[3]  
[Anonymous], 2001, NAACL 2001
[4]  
BarriThre Caroline., 2006, P INT C TERM STAND T, P103
[5]   The Semantic Web - A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities [J].
Berners-Lee, T ;
Hendler, J ;
Lassila, O .
SCIENTIFIC AMERICAN, 2001, 284 (05) :34-+
[6]  
Borsje J, 2008, APPLIED COMPUTING 2008, VOLS 1-3, P2415
[7]   Learning concept hierarchies from text corpora using formal concept analysis [J].
Cimiano, P ;
Hotho, A ;
Staab, S .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2005, 24 (24) :305-339
[8]  
Cimiano P., 2003, Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia, P10
[9]   Domain taxonomy learning from text: The subsumption method versus hierarchical clustering [J].
de Knijff, Jeroen ;
Frasincar, Flavius ;
Hogenboom, Frederik .
DATA & KNOWLEDGE ENGINEERING, 2013, 83 :54-69
[10]  
Decadt B., 2004, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, P108