Using WordNet for text categorization

被引:0
作者
Elberrichi, Zakaria
Rahmoun, Abdelattif
Bentaalah, Mohamed Amine
机构
关键词
20newsgroups; ontology; reuters-21578; text categorization; wordNet; cosine distance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores a method that use WordNet concept to categorize text documents. The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different ways to form a new representative vector. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667.
引用
收藏
页码:16 / 24
页数:9
相关论文
共 50 条
[31]   Noisy text categorization [J].
Vinciarelli, A .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (12) :1882-1895
[32]   Text categorization with ILA [J].
Sever, H ;
Gorur, A ;
Tolun, MR .
COMPUTER AND INFORMATION SCIENCES - ISCIS 2003, 2003, 2869 :300-307
[33]   A semantic term weighting scheme for text categorization [J].
Luo, Qiming ;
Chen, Enhong ;
Xiong, Hui .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) :12708-12716
[34]   A comparative study for WordNet guided text representation [J].
Zhang, JA ;
Li, CP .
AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 :883-887
[35]   Text Categorization for Generation of a Historical Shipbuilding Ontology [J].
Artemova, Galina ;
Boyarsky, Kirill ;
Gouzevitch, Dmitri ;
Gusarova, Natalia ;
Dobrenko, Natalia ;
Kanevsky, Eugeny ;
Petrova, Daria .
KNOWLEDGE ENGINEERING AND THE SEMANTIC WEB, KESW 2014, 2014, 468 :1-14
[36]   Text Categorization using bibliographic records: beyond document content [J].
Montejo-Raez, Arturo ;
Alfonso Urena-Lopez, L. ;
Steinberger, Ralf .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35) :119-126
[37]   Neural Text Categorizer for Exclusive Text Categorization [J].
Jo, Taeho .
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2008, 4 (02) :77-86
[38]   <bold>USING MULTIPLE SETS OF ATTRIBUTES FOR TEXT CATEGORIZATION</bold> [J].
Bi, Ya-Xin ;
Zhang, Qiang ;
Wu, Sheno-Li ;
Guan, Ji-Wen .
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, :2252-+
[39]   Using the Web as corpus for self-training text categorization [J].
Rafael Guzmán-Cabrera ;
Manuel Montes-y-Gómez ;
Paolo Rosso ;
Luis Villaseñor-Pineda .
Information Retrieval, 2009, 12 :400-415
[40]   An incremental learning approach for the text categorization using hybrid optimization [J].
Kayest, Mamta ;
Jain, Sanjay Kumar .
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2019, 12 (03) :333-351