Mapping sentences to concept transferred space for semantic textual similarity

被引:0
作者
Heyan Huang
Hao Wu
Xiaochi Wei
Yang Gao
Shumin Shi
机构
[1] Beijing Institute of Technology,School of Computer Science and Technology
[2] Capital Normal University,Beijing Advanced Innovation Center for Imaging Technology
来源
Knowledge and Information Systems | 2019年 / 60卷
关键词
Semantic textual similarity; Concept transferred space; Information content; WordNet;
D O I
暂无
中图分类号
学科分类号
摘要
Semantic textual similarity (STS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {STS}$$\end{document}) seeks to assess the degree of semantic equivalence between two sentences or snippets of texts. Most methods of STS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {STS}$$\end{document} are based on word surface and deem words as meaning unrelated symbols, which makes these methods indiscriminative for ubiquitous conceptual association among words. Recently, concept transferred space (CTS) is proposed to solve word conceptual association problem. It is generated from the noun concepts with their IS-A relations in WordNet. However, the CTS-based model can only calculate nouns; as a result, a large number of words, i.e., verbs, adjectives, adverbs as well as out-of-vocabulary named entities (OOV NEs), are neglected, thus resulting in information loss in the semantic similarity evaluation. This paper presents ways to solve this problem: To involve words other than nouns, derivational links in WordNet are employed to associate verbs, adjectives, and adverbs with their corresponding noun concepts; to prevent information loss by OOV NEs, the increased quantity of information of them is predicted according to the tendency learned from known NEs. Moreover, to further improve the accuracy of the CTS-based model, we take the importance of different types of words into consideration by assigning corresponding weights for them. Experimental results suggest that the proposed comprehensive CTS-based model achieves significant improvement compared with the primitive one without the non-nominal words, OOV NEs, and word weights and also outperforms all the yearly state-of-the-art systems at the *SEM/SemEval 2013–2016 STS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {STS}$$\end{document} tasks. Additionally, at the SemEval 2017 STS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {STS}$$\end{document} task, our team with the comprehensive CTS-based model ranked the second and the first among all teams and on Track 1 dataset, respectively.
引用
收藏
页码:1353 / 1376
页数:23
相关论文
共 33 条
  • [1] Islam A(2008)Semantic text similarity using corpus-based word similarity and string similarity ACM Trans Knowl Discov Data 2 10-1150
  • [2] Inkpen D(2006)Sentence similarity based on semantic nets and corpus statistics IEEE Trans Knowl Data Eng 18 1138-41
  • [3] Li Y(1995)WordNet: a lexical database for English Commun ACM 38 39-405
  • [4] McLean D(2011)SyMSS: a syntax-based measure for short-text semantic similarity Data Knowl Eng 70 390-30
  • [5] Bandar ZA(1989)Development and application of a metric on semantic nets IEEE Trans Syst Man Cybern 19 17-36
  • [6] O’shea JD(1968)Computer evaluation of indexing and text processing J ACM 15 8-230
  • [7] Crockett K(2014)Back to basics for monolingual alignment: exploiting word similarity and contextual evidence Trans Assoc Comput Linguist 2 219-963
  • [8] Miller GA(2017)Fast affinity propagation clustering based on incomplete similarity matrix Knowl Inf Syst 51 941-1652
  • [9] Oliva J(2016)Sentence similarity computational model based on information content IEICE Trans Inf Syst 99 1645-241
  • [10] Serrano JI(2017)Efficient algorithm for sentence information content computing in semantic hierarchical network IEICE Trans Inf Syst 100 238-819