Semantic Evaluation of Text Clustering

被引:0
作者
Sinh Hoa Nguyen [1 ]
Swieboda, Wojciech [2 ]
Hung Son Nguyen [2 ]
机构
[1] Polish Japanese Inst Inf Technol, Koszykowa 86, PL-02008 Warsaw, Poland
[2] Warsaw Univ, Inst Math, Banacha 2, PL-02097 Warsaw, Poland
来源
ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING | 2014年 / 282卷
关键词
Text clustering; semantic evaluation; Pubmed; MeSH;
D O I
10.1007/978-3-319-06569-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, called SEE (Semantic Evaluation by Exploration), which is an improvement of the existing methods such as Rand Index or Normalized Mutual Information. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. We compare different semantic techniques for search result clustering using the proposed measure.
引用
收藏
页码:269 / 280
页数:12
相关论文
共 21 条
[1]  
[Anonymous], 2001, The elements of statistical learning: data mining, inference and prediction
[2]  
Assent I., 2007, ACM SIGKDD Explorations Newsletter, V9, P5, DOI DOI 10.1145/1345448.1345451
[3]   Density connected clustering with local subspace preferences [J].
Böhm, C ;
Kailing, K ;
Kriegel, HP ;
Kröger, P .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :27-34
[4]  
Cao TH, 2008, IEEE INT CONF FUZZY, P2030
[5]   A day in the life of PubMed: Analysis of a typical day's query log [J].
Herskovic, Jorge R. ;
Tantaka, Len Y. ;
Hersh, William ;
Bernstam, Elmer V. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (02) :212-220
[6]   A generic framework for efficient subspace clustering of high-dimensional data [J].
Kriegel, HP ;
Kröger, P ;
Renz, M ;
Wurst, S .
Fifth IEEE International Conference on Data Mining, Proceedings, 2005, :250-257
[7]  
Kroger Peer., 2004, SDM
[8]   Detecting the overlapping and hierarchical community structure in complex networks [J].
Lancichinetti, Andrea ;
Fortunato, Santo ;
Kertesz, Janos .
NEW JOURNAL OF PHYSICS, 2009, 11
[9]  
MacQueen J., 1967, P 5 BERK S MATH STAT, V1, P281, DOI DOI 10.1007/S11665-016-2173-6
[10]  
Manning C., 2007, INTRO INFORM RETRIEV