Chinese Personal Name Disambiguation Based on Clustering

被引:1
作者
Fan, Chao [1 ,2 ]
Li, Yu [1 ,2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China
[2] Jiangnan Univ, Jiangsu Key Lab Media Design & Software Technol, Wuxi 214122, Jiangsu, Peoples R China
关键词
D O I
10.1155/2021/3790176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.
引用
收藏
页数:7
相关论文
共 15 条
  • [1] Chao Fan, 2013, ICIC Express Letters, V7, P1693
  • [2] Chen C, 2009, P ACL IJNCLP 2009 ST, P88, DOI DOI 10.3115/1667884.1667897
  • [3] Hongliang Du, 2019, 2019 International Conference on Data Mining Workshops (ICDMW). Proceedings, P1037, DOI 10.1109/ICDMW.2019.00150
  • [4] Hotho A, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P541
  • [5] Ke Hao, 2018, Journal of the China Society for Scientific and Technical Information, V37, P600, DOI 10.3772/j.issn.1000-0135.2018.06.005
  • [6] Hybrid Deep Pairwise Classification for Author Name Disambiguation
    Kim, Kunho
    Rohatgi, Shaurya
    Giles, C. Lee
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2369 - 2372
  • [7] [李广一 Li Guangyi], 2013, [中文信息学报, Journal of Chinese Information Processing], V27, P29
  • [8] OnPerDis: Ontology-based Personal Name Disambiguation on the Web
    Lu, Zhao
    Yan, Zhixian
    He, Liang
    [J]. 2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2013, : 185 - 192
  • [9] A Graph Combination With Edge Pruning-Based Approach for Author Name Disambiguation
    Pooja, K. M.
    Mondal, Samrat
    Chandra, Joydeep
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2020, 71 (01) : 69 - 83
  • [10] Protasiewicz J, 2016, IEEE SYS MAN CYBERN, P594, DOI 10.1109/SMC.2016.7844305