Ontology Alignment Based on Word Embedding and Random Forest Classification

被引:9
作者
Nkisi-Orji, Ikechukwu [1 ]
Wiratunga, Nirmalie [1 ]
Massie, Stewart [1 ]
Hui, Kit-Ying [1 ]
Heaven, Rachel [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] British Geol Survey, Nottingham, England
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I | 2019年 / 11051卷
关键词
Ontology alignment; Word embedding; Machine classification; Semantic web; AGGREGATION;
D O I
10.1007/978-3-030-10925-7_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. Most alignment techniques depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding for determining a variety of semantic similarity features between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts align. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can handle knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of a composition of similarity measures. Experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems. Code related to this paper is available at: https://bitbucket.org/paravariar/rafcom.
引用
收藏
页码:557 / 572
页数:16
相关论文
共 50 条
  • [31] DeepPatent: patent classification with convolutional neural networks and word embedding
    Li, Shaobo
    Hu, Jie
    Cui, Yuxin
    Hu, Jianjun
    SCIENTOMETRICS, 2018, 117 (02) : 721 - 744
  • [32] Unsupervised Feature Selection for Text Classification via Word Embedding
    Rui, Weikang
    Liu, Jinwen
    Jia, Yawei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 37 - 41
  • [33] Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification
    Ruifeng Xu
    Tao Chen
    Yunqing Xia
    Qin Lu
    Bin Liu
    Xuan Wang
    Cognitive Computation, 2015, 7 : 226 - 240
  • [34] Dynamically Jointing character and word embedding for Chinese text Classification
    Tang, Xuetao
    Hu, Xuegang
    Li, Peipei
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 336 - 343
  • [35] Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
    Han, Shangzhuang
    Shirai, Kiyoaki
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 1218 - 1225
  • [36] EVOLUTIONARY COMBINATORIAL OPTIMIZATION FOR WORD EMBEDDING (ECOWE) IN SENTIMENT CLASSIFICATION
    Gunasegaran, Thineswaran
    Cheah, Yu-N
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2019, : 34 - 45
  • [37] Mutual Information-Based Word Embedding for Unsupervised Cross-Domain Sentiment Classification
    Liang, Junge
    Ma, Lei
    Xiong, Xin
    Shao, Dangguo
    Xiang, Yan
    Wang, Xiongbing
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 625 - 628
  • [38] A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter
    Shin, Han-Sub
    Kwon, Hyuk-Yoon
    Ryu, Seung-Jin
    ELECTRONICS, 2020, 9 (09) : 1 - 21
  • [39] DeepPatent: patent classification with convolutional neural networks and word embedding
    Shaobo Li
    Jie Hu
    Yuxin Cui
    Jianjun Hu
    Scientometrics, 2018, 117 : 721 - 744
  • [40] A topic-enhanced word embedding for Twitter sentiment classification
    Ren, Yafeng
    Wang, Ruimin
    Ji, Donghong
    INFORMATION SCIENCES, 2016, 369 : 188 - 198