Automatic Assamese Text Categorization Using WordNet

被引:0
|
作者
Sarmah, Jumi [1 ]
Barman, Anup Kumar [1 ]
Sarma, Shikhar Kr. [1 ]
机构
[1] Gauhati Univ, Dept Informat Technol, Gauhati, India
来源
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2013年
关键词
Text Categorization; Assamese WordNet; Word Sense Disambiguation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing rate of Assamese text contents in digital format encourages us to generate a system that automatically categorizes them. This paper discusses a system that will perform the categorization of texts automatically based on the knowledge from Assamese WordNet. In WordNet, synset correspond to the words which implies the same concept and words having more than one sense in a particular text content is disambiguated in this approach. This approach extracts words occurred in the document and uses them to create a synset vector with union to its corresponding synsets from WordNet. To increase our performance, we present a process where it increases the weight of not only the terms but also that of the synsets corresponding to the terms. We later count the occurrences of the senses that help in disambiguation tasks by propagating the relationship between synsets. The proposed method outcomes with a reasonable state of art accuracy when measured with Precision and Recall.
引用
收藏
页码:85 / 89
页数:5
相关论文
共 50 条
  • [41] Application of a staged learning-based resource allocation network to automatic text categorization
    Song, Wei
    Chen, Peng
    Park, Soon Cheol
    NEUROCOMPUTING, 2015, 149 : 1125 - 1134
  • [42] Neural Text Categorizer for Exclusive Text Categorization
    Jo, Taeho
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2008, 4 (02): : 77 - 86
  • [43] Text Categorization using bibliographic records: beyond document content
    Montejo-Raez, Arturo
    Alfonso Urena-Lopez, L.
    Steinberger, Ralf
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 119 - 126
  • [44] Latent semantic analysis for text categorization using neural network
    Yu, Bo
    Xu, Zong-ben
    Li, Cheng-hua
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 900 - 904
  • [45] Boosting Naive Bayes Text Categorization by Using Cloud Model
    Wan, Jian
    He, Tingting
    Chen, Jinguang
    Dong, Jinling
    2011 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL, AND SYSTEMS SCIENCES, AND ENGINEERING (CESSE 2011), 2011, : 165 - +
  • [46] Text Categorization using Rocchio Algorithm and Random Forest Algorithm
    Selvi, Thamarai S.
    Karthikeyan, P.
    Vincent, A.
    Abinaya, V
    Neeraja, G.
    Deepika, R.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 7 - 12
  • [47] <bold>USING MULTIPLE SETS OF ATTRIBUTES FOR TEXT CATEGORIZATION</bold>
    Bi, Ya-Xin
    Zhang, Qiang
    Wu, Sheno-Li
    Guan, Ji-Wen
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2252 - +
  • [48] Using the Web as corpus for self-training text categorization
    Rafael Guzmán-Cabrera
    Manuel Montes-y-Gómez
    Paolo Rosso
    Luis Villaseñor-Pineda
    Information Retrieval, 2009, 12 : 400 - 415
  • [49] An incremental learning approach for the text categorization using hybrid optimization
    Kayest, Mamta
    Jain, Sanjay Kumar
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2019, 12 (03) : 333 - 351
  • [50] Comparative study on using artificial neural networks for text categorization
    Malik, AM
    Stacey, DA
    Song, F
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: INDUSTRIAL SYSTEMS AND ENGINEERING I, 2002, : 97 - 102