A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

被引:34
作者
Lee, Ming Che [1 ]
Chang, Jia Wei [2 ]
Hsieh, Tung Cheng [3 ]
机构
[1] Ming Chuan Univ, Dept Comp & Commun Engn, Taoyuan 333, Taiwan
[2] Natl Cheng Kung Univ, Dept Engn Sci, Tainan 701, Taiwan
[3] Hsuan Chuang Univ, Dept Visual Commun Design, Hsinchu 300, Taiwan
来源
SCIENTIFIC WORLD JOURNAL | 2014年
关键词
INFORMATION; PRINCIPLES; EXTRACTION; RETRIEVAL; WORDNET;
D O I
10.1155/2014/437162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
引用
收藏
页数:17
相关论文
共 66 条
[1]  
Albert L. K., 1993, THESIS U TWENTE TWEN
[2]  
[Anonymous], 1971, The SMART Retrieval System-Experiments in Automatic Document Processing
[3]  
[Anonymous], 2004, P 2004 C EMPIRICAL M
[4]  
[Anonymous], 2006, Proceedings of the Australasian Language Technology Workshop (ALTW 2006)
[5]  
[Anonymous], 2001, P 12 EUR C MACH LEAR, DOI DOI 10.1007/3-540-44795-4_42
[6]   Combining information extraction with genetic algorithms for text mining [J].
Atkinson-Abutridy, J ;
Mellish, C ;
Aitken, S .
IEEE INTELLIGENT SYSTEMS, 2004, 19 (03) :22-30
[7]   Exploiting latent semantic information in statistical language modeling [J].
Bellegarda, JR .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296
[8]   The Semantic Web - A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities [J].
Berners-Lee, T ;
Hendler, J ;
Lassila, O .
SCIENTIFIC AMERICAN, 2001, 284 (05) :34-+
[9]   Explorations in context space: Words, sentences, discourse [J].
Burgess, C ;
Livesay, K ;
Lund, K .
DISCOURSE PROCESSES, 1998, 25 (2-3) :211-257
[10]  
Chen K. U., 2011, INT J DIGITAL CONTEN, V5, P218