Data acquisition - Glossaries - Information management - Information retrieval - Semantics - Thesauri;
D O I:
10.1527/tjsai.17.539
中图分类号:
学科分类号:
摘要:
There have been several previous studies on measuring the semanticsimilarity between words whose concepts are represented as points in amulti-dimensional vector space acquired from text data such aselectronic dictionaries or text corpora. A central problem in thesestudies is how to select orthonormal basis vectors for the space whichrepresents attributes of the words. We propose a method of buildingthe space by combining two representative methods, one using singularvalue decomposition and the other using the contents of a thesaurus.The proposed method was evaluated both for the purposes of similarword retrieval and for document retrieval. The evaluations showedthat the proposed combination is more effective than either of theoriginal methods alone for both of these tasks.