Cluster Validation Techniques for Bibliographic Databases

被引:0
作者
Mishra, Sumit [1 ]
Saha, Sriparna [1 ]
Mondal, Samrat [1 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 800013, Bihar, India
来源
2014 IEEE STUDENTS' TECHNOLOGY SYMPOSIUM (IEEE TECHSYM) | 2014年
关键词
Bibliographic Database; Entity name disambiguation; Validity Index; Golden Standard; VALIDITY MEASURE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In entity name disambiguation technique, records of same entity are clustered together. One of the major challenges in such technique is to validate the result as the actual or correct results are often not known or difficult to know. In this context, three commonly known evaluation measures are precision, recall and f-measure. All these indices are external validity indices as they all need gold standard data. But in Bibliographic databases like DBLP, Arnetminer, Scopus, Web of Science etc., obtaining golden standard is very difficult for each entity. So, there is a need to use some other metrics to evaluate the performance on Bibliographic data. In this paper, a novel scheme based on internal validity index is used to evaluate the performance of entity name disambiguation algorithm. Several distance measures are used here to compute the similarity between two records. These functions are then incorporated in the definitions of internal validity indices.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 25 条
[1]   Nonparametric genetic clustering: Comparison of validity indices [J].
Bandyopadhyay, S ;
Maulik, U .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01) :120-125
[2]  
Bandyopadhyay S., 2013, UNSUPERVISED CLASSIF, P75
[3]   An improved algorithm for clustering gene expression data [J].
Bandyopadhyay, Sanghamitra ;
Mukhopadhyay, Anirban ;
Maulik, Ujjwal .
BIOINFORMATICS, 2007, 23 (21) :2859-2865
[4]  
Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[5]  
Chou C.-H., 2002, 2nd WSEAS Int. Conf. on Scientific Computation and Soft Computing, P209
[6]   A new cluster validity measure and its application to image compression [J].
Chou, CH ;
Su, MC ;
Lai, E .
PATTERN ANALYSIS AND APPLICATIONS, 2004, 7 (02) :205-220
[7]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[8]  
Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046
[9]  
HALKIDI M, 2001, PATTERN ANAL APPL, V17, P107
[10]  
Hernandez M. A., 1995, SIGMOD Record, V24, P127, DOI 10.1145/568271.223807