Improved scalability in mining using ontology record linkage algorithm

被引:0
作者
Prabhu, T. [1 ]
Dhas, C. Suresh Gnana [2 ]
机构
[1] Manonmaniam Sundaranar Univ, Dept Comp Sci & Engn, Thirunelveli 627012, Tamil Nadu, India
[2] Vivekanadha Coll Engn Women, Dept Comp Sci & Engn, Tiruchengode 637205, Tamil Nadu, India
关键词
Record linkage; Data mining; Angle based neighborhood; Ontology; Conventional method; INJURIES;
D O I
10.1016/j.compeleceng.2018.01.026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage offers wide role in record identification and relevant datasets matching. The conventional researchers use probabilistic approach to identify reliable and unique datasets. Record linkage with probabilistic approach exploits data, which are common to an individual record pair. Classical methods have equality based record linkage in common fields. Therefore, errors associated with record linkage reduce the scalability. In this paper, a similarity between individual values of record pairs is improved using ontology-based semantic similarity model. Semantic similarity between the records is tested successfully using angle based neighborhood graph. To validate the proposed approach, a conventional record linkage algorithm is compared with angle based neighborhood ontology record linkage technique, which achieves improved accuracy and scalability. Finally, the accuracy of identifying similar semantic matches is more scalable in proposed technique than conventional methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:511 / 519
页数:9
相关论文
共 16 条
  • [1] Supervised learning using a symmetric bilinear form for record linkage
    Abril, Daniel
    Torra, Vicenc
    Navarro-Arribas, Guillermo
    [J]. INFORMATION FUSION, 2015, 26 : 144 - 153
  • [2] Improving record linkage with supervised learning for disclosure risk assessment
    Abril, Daniel
    Navarro-Arribas, Guillermo
    Torra, Vicenc
    [J]. INFORMATION FUSION, 2012, 13 (04) : 274 - 284
  • [3] Privacy preserving record linkage in the presence of missing values
    Chi, Yuan
    Hong, Jun
    Jurek, Anna
    Liu, Weiru
    O'Reilly, Dermot
    [J]. INFORMATION SYSTEMS, 2017, 71 : 199 - 210
  • [4] Revisiting distance-based record linkage for privacy-preserving release of statistical datasets
    Herranz, Javier
    Nin, Jordi
    Rodriguez, Pablo
    Tassa, Tamir
    [J]. DATA & KNOWLEDGE ENGINEERING, 2015, 100 : 78 - 93
  • [5] A novel ensemble learning approach to unsupervised record linkage
    Jurek, Anna
    Hong, Jun
    Chi, Yuan
    Liu, Weiru
    [J]. INFORMATION SYSTEMS, 2017, 71 : 40 - 54
  • [6] Li T, 2017, NEUROCOMPUTING
  • [7] Lu Y, 2017, TELEMATICS INF
  • [8] Improving record linkage performance in the presence of missing linkage data
    Ong, Toan C.
    Mannino, Michael V.
    Schilling, Lisa M.
    Kahn, Michael G.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 : 43 - 54
  • [9] Qualifying information on deaths and serious injuries caused by road traffic in five Brazilian capitals using record linkage
    Pimenta Mandacaru, Polyana Maria
    Andrade, Ana Lucia
    Rocha, Marli Souza
    Aguiar, Fernanda Pinheiro
    Nogueira, Maria Sueli M.
    Girodo, Anne Marielle
    Galas Pedrosa, Ana Amelia
    Alves de Oliveira, Vera Lidia
    Malheiros Alves, Marta Maria
    Paixao, Lucia Maria Miana M.
    Malta, Deborah Carvalho
    Alves Silva, Marta Maria
    de Morais Neto, Otaliba Libanio
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2017, 106 : 392 - 398
  • [10] Privacy-preserving record linkage on large real world datasets
    Randall, Sean M.
    Ferrante, Anna M.
    Boyd, James H.
    Bauer, Jacqueline K.
    Semmens, James B.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 205 - 212