The Effects of Globalization Functions on Feature Weighting for Text Classification

被引:0
作者
Dogan, Turgut [1 ]
Uysal, Alper Kursat [1 ]
机构
[1] Eskisehir Teknik Univ, Bilgisayar Muhendisligi, Eskisehir, Turkey
来源
2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP) | 2018年
关键词
Text classification; feature weighting; term weighting; globalization functions; FEATURE-SELECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is crucial to easily access, index and storage by categorizing textual documents. An efficient categorization of text documents depends on the assignment of appropriate weights to the features besides using appropriate feature sets. This has attracted the researchers' attention to feature weighting methods for text classification. While some of the feature weighting methods in the literature generates a single global weight score for each feature, some of them generate class-based scores for each feature in the dataset. In this study, the impact of globalization functions on feature weighting for text classification is investigated in details. For this purpose, various experiments were carried out on 3 benchmark datasets using 3 different feature weighting methods, 2 different feature globalization functions, and 2 different classifiers. Also, various feature dimensions were used in the experiments in order to analyze the dependency between globalization functions and feature sizes. So, the impact of two different globalization functions have been tested for three different feature weighting methods namely TF.MI, TF.CHI2, and TF.PS. Experimental results obtained with SVM and KNN classifiers on the Reuters-21578, Mininew20, and WebKB datasets reveal that choosing appropriate globalization function for feature weighting methods may provide improvement on the performance of classification depending on various experimental settings used.
引用
收藏
页数:6
相关论文
共 17 条
[1]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[2]   Turning from TF-IDF to TF-IGM for term weighting in text classification [J].
Chen, Kewen ;
Zhang, Zuping ;
Long, Jun ;
Zhang, Hao .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 :245-260
[3]   Using chi-square statistics to measure similarities for text categorization [J].
Chen, Yao-Tsung ;
Chen, Meng Chang .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) :3085-3090
[4]  
Debole F, 2004, STUD FUZZ SOFT COMP, V138, P81
[5]   Supervised and Traditional Term Weighting Methods for Automatic Text Categorization [J].
Lan, Man ;
Tan, Chew Lim ;
Su, Jian ;
Lu, Yue .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) :721-735
[6]   Information gain and divergence-based feature selection for machine learning-based text categorization [J].
Lee, CK ;
Lee, GG .
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) :155-165
[7]   Text categorization with support vector machines.: How to represent texts in input space? [J].
Leopold, E ;
Kindermann, J .
MACHINE LEARNING, 2002, 46 (1-3) :423-444
[8]   Imbalanced text classification: A term weighting approach [J].
Liu, Ying ;
Loh, Han Tong ;
Sun, Aixin .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) :690-701
[9]   Feature selection with a measure of deviations from Poisson in text categorization [J].
Ogura, Hiroshi ;
Amano, Hiromi ;
Kondo, Masato .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :6826-6832
[10]  
Prasath V, 2017, ARXIV170804321