Vietnamese Author Name Disambiguation for Integrating Publications from Heterogeneous Sources

被引:0
|
作者
Tin Huynh [1 ]
Kiem Hoang [1 ]
Tien Do [1 ]
Duc Huynh [1 ]
机构
[1] Univ Informat Technol, Ho Chi Minh City, Vietnam
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT I, | 2013年 / 7802卷
关键词
Digital Library; Data Integration; Bibliographical Data; Author Disambiguation; Machine Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic integration of bibliographical data from various sources is a really critical task in the field of digital libraries. One of the most important challenges for this process is the author name disambiguation. In this paper, we applied supervised learning approach and proposed a set of features that can be used to assist training classifiers in disambiguating Vietnamese author names. In order to evaluate efficiency of the proposed features set, we did experiments on five supervised learning methods: Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (kNN), C4.5 (Decision Tree), Bayes. The experiment dataset collected from three online digital libraries such as Microsoft Academic Search(1), ACM Digital Library(2), IEEE Digital Library(3). Our experiments shown that kNN, Random Forest, C4.5 classifier outperform than the others. The average accuracy archived with kNN approximates 94.55%, random forest is 94.23%, C4.5 is 93.98%, SVM is 91.91% and Bayes is lowest with 81.56%. Summary, we archived the highest accuracy 98.39% for author name disambiguation problem with the proposed feature set in our experiments on the Vietnamese authors dataset.
引用
收藏
页码:226 / 235
页数:10
相关论文
共 50 条
  • [1] Author Name Disambiguation Based on Heterogeneous Graph
    Ma, Chuang
    Xia, Helong
    Journal of Computers (Taiwan), 2023, 34 (04) : 41 - 52
  • [2] Author Name Disambiguation Based on Heterogeneous Information Network
    Qiping D.
    Weijing C.
    Ling J.
    Yu’e Z.
    Data Analysis and Knowledge Discovery, 2022, 6 (04) : 60 - 68
  • [3] An Effective Author Name Disambiguation Framework for Large-Scale Publications
    Zhou, Anji
    Shi, Minghui
    Yuan, Rui
    IEEE ACCESS, 2024, 12 : 182086 - 182100
  • [4] Improving Similarity Measures for Publications with Special Focus on Author Name Disambiguation
    Muhammad Shoaib
    Ali Daud
    Malik Sikandar Hayat Khiyal
    Arabian Journal for Science and Engineering, 2015, 40 : 1591 - 1605
  • [5] Improving Similarity Measures for Publications with Special Focus on Author Name Disambiguation
    Shoaib, Muhammad
    Daud, Ali
    Khiyal, Malik Sikandar Hayat
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2015, 40 (06) : 1591 - 1605
  • [6] Author Name Disambiguation
    Smalheiser, Neil R.
    Torvik, Vetle I.
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2009, 43 : 287 - 313
  • [7] Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives
    Xie, Wenjin
    Liu, Siyuan
    Wang, Xiaomeng
    Jia, Tao
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 245 - 250
  • [8] Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning
    Wang, Haiwen
    Wang, Ruijie
    Wen, Chuan
    Li, Shuhao
    Jia, Yuting
    Zhang, Weinan
    Wang, Xinbing
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 238 - 245
  • [9] Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation
    Zheng, Xin
    Zhang, Pengyu
    Cui, Yanjie
    Du, Rong
    Zhang, Yong
    INFORMATION, 2021, 12 (09)
  • [10] Author Name Disambiguation for PubMed
    Liu, Wanli
    Dogan, Rezarta Islamaj
    Kim, Sun
    Comeau, Donald C.
    Kim, Won
    Yeganova, Lana
    Lu, Zhiyong
    Wilbur, W. John
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (04) : 765 - 781