Semantic Representation of Malayalam Text Documents in Cricket Domain Using WordNet

被引:0
作者
Kumar, Sreedhi Deleep [1 ]
Reshma, E. U. [1 ]
Sunitha, C. [1 ]
Ganesh, Amal [1 ]
机构
[1] Vidya Acad Sci & Technol, Comp Sci Dept, Trichur, India
来源
INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018 | 2019年 / 26卷
关键词
Semantics; Malayalam; Cricket; WordNet; Semantic triplets;
D O I
10.1007/978-3-030-03146-6_49
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Semantic representation is an abstract language for representing the meaning of text. It is used for representing the sentences semantically which can be employed in various applications such as Question Answering System, Information Extraction, Summarization, Machine translation etc. Various methods are employed to represent text document. But only limited works are done in Malayalam language. A specific domain is chosen (Cricket Domain) so as to obtain better results in semantic representation. A lexical database in Malayalam (WordNet), will be used as a resource for obtaining the required information. WordNet is a hierarchical information base in any language. In this project, semantic representation is extracted from a single Malayalam text document. It generates an abstractive representation of the given input. Semantic representation can be effectively extracted after going through different stages. Tokenization involves separation of words from sentences as tokens whereas POS Tagging deals with tagging of these tokens as corresponding Nouns, Verbs, Adjectives etc. The so got tagged tokens will undergo Morphological analysis. Morphological analysis is the process of finding the stem word for each of the generated tokens. After the analysis, the details regarding the stem words are obtained by searching in the WordNet. Next, the Semantic triplets (Subject, Object, Predicate) are extracted from the sentence. These triplets will be helpful for obtaining the semantic representation. For representation, the verb is taken as the root element. The aim of this project is semantic representation of Malayalam text documents pertaining to cricket domain using the database WordNet.
引用
收藏
页码:439 / 447
页数:9
相关论文
共 10 条
[1]  
Aref M., 2010, 10 C LANG ENG EG
[2]  
Banu M., 2007, INT C COMP INT MULT
[3]  
Gupta V., 2011, PREPROCESSING PHASE
[4]  
Jaya A., 2016, INT C REC TRENDS COM, DOI [10.1016/j.procs.2016.05.121, DOI 10.1016/J.PROCS.2016.05.121]
[5]  
Jayashree R, 2011, INT J SOFT COMPUTING, V2, P81, DOI DOI 10.5121/IJSC.2011.2408
[6]  
Kabeer R., 2014, INT C DAT SCI ENG IC
[7]  
Khanam M H, 2016, J. Comput. Eng., P25
[8]  
Moawad IF, 2012, 2012 SEVENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES'2012), P132, DOI 10.1109/ICCES.2012.6408498
[9]  
Subramaniam M., 2015, IRJET, V02
[10]  
Thaokar C, 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), P1138