An Information-Theoretic Foundation for the Measurement of Discrimination Information

被引:9
作者
Cai, Di [1 ]
机构
[1] Wolverhampton Univ, Sch Comp & IT, Ctr Technol, Wolverhampton WV1 1LY, W Midlands, England
关键词
Statistical semantic analysis; measurement of discrimination information; measurement of semantic relatedness; informative term identification; key term extraction; text mining; information retrieval; SEMANTIC SIMILARITY; RETRIEVAL;
D O I
10.1109/TKDE.2009.134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hitherto, it has not been easy to interpret the meaning of the amount of discrimination information conveyed in a term rationally and explicitly within practical application contexts; it has not been simple to introduce the concept of the extent of semantic relatedness between two terms meaningfully and successfully into scientific discussions. This study is part of an attempt to do this. We attempt to answer two important questions: 1) What is the discrimination information conveyed by a term and how to measure it? 2) What is the relatedness between two terms and how to estimate it? We focus on the first question and present an in-depth investigation into the discrimination measures based on several information measures, which are widely used in a variety of applications. The relatedness measures are then naturally defined according to the individual discrimination measures. Some key points are made for clarifying potential problems arising from using the relatedness measures, and solutions are suggested. Two example applications in the contexts of text mining and information retrieval are provided. The aim of this study, of which this paper forms part, is to establish a unified theoretical framework, with measurement of discrimination information (MDI) at the core, for achieving effective measurement of semantic relatedness (MSR). Due to its generality, our method can be expected to be a useful tool with a wide range of application areas.
引用
收藏
页码:1262 / 1273
页数:12
相关论文
共 43 条
[1]  
[Anonymous], 1951, Biometrika
[2]  
[Anonymous], J MACHINE LEARNING R
[3]  
[Anonymous], 2005, P ACL WORKSHOP EMPIR
[4]  
[Anonymous], 1999, P 37 ANN M ASS COMP, DOI DOI 10.3115/1034678.1034693
[5]  
[Anonymous], 1997, PROC 10 RES COMPUTAT
[6]  
BUDANITSKY A, 2005, COMPUTATIONAL LINGUI, V4, P1
[7]  
Busdanistky A., 2001, Proceedings of the workshop on WordNet and other lexical resources, second meeting of the north American chapter of the association for computational linguistics, P29
[8]   Learning semantic relatedness from term discrimination information [J].
Cai, D. ;
van Rijsbergen, C. J. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :1860-1875
[9]  
CAI D, 2009, INT J INTELL SYST, V24, P477
[10]  
CAI D, 2007, P 1 INT C THEOR INF, P151