Using WordNet for text categorization

被引:0
|
作者
Elberrichi, Zakaria
Rahmoun, Abdelattif
Bentaalah, Mohamed Amine
机构
关键词
20newsgroups; ontology; reuters-21578; text categorization; wordNet; cosine distance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores a method that use WordNet concept to categorize text documents. The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different ways to form a new representative vector. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667.
引用
收藏
页码:16 / 24
页数:9
相关论文
共 50 条
  • [1] Automatic Assamese Text Categorization Using WordNet
    Sarmah, Jumi
    Barman, Anup Kumar
    Sarma, Shikhar Kr.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 85 - 89
  • [2] Fully Automatic Text Categorization by Exploiting WordNet
    Li, Jianqiang
    Zhao, Yu
    Liu, Bo
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 1 - 12
  • [3] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [4] Evaluation of Text Clustering Methods Using WordNet
    Amine, Abdelmalek
    Elberrichi, Zakaria
    Simonet, Michel
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2010, 7 (04) : 349 - 357
  • [5] An Approach to Automatic Text Summarization using WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 1169 - 1173
  • [6] Exploiting Ontology Recommendation Using Text Categorization Approach
    Sarwar, Muhammad Azeem
    Ahmed, Mansoor
    Habib, Asad
    Khalid, Muhammad
    Ali, M. Akhtar
    Raza, Mohsin
    Hussain, Shahid
    Ahmed, Ghufran
    IEEE ACCESS, 2021, 9 : 27304 - 27322
  • [7] A Comprehensive Analysis of using Semantic Information in Text Categorization
    Celik, Kerem
    Gungor, Tunga
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [8] Using WordNet to disambiguate word senses for text classification
    Liu, Ying
    Scheuermann, Peter
    Li, Xingsen
    Zhu, Xingquan
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 781 - +
  • [9] Biomedical text categorization using UMLS
    Perea Ortega, Jose Manuel
    Martin Valdivia, Maria Teresa
    Montejo Raez, Arturo
    Diaz Galiano, Manuel Carlos
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 121 - 127
  • [10] A semantic approach for text clustering using WordNet and lexical chains
    Wei, Tingting
    Lu, Yonghe
    Chang, Huiyou
    Zhou, Qiang
    Bao, Xianyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (04) : 2264 - 2275