A Hybrid Attribute Selection Approach for Text Classification

被引:0
作者
Chou, Chen-Huei
Sinha, Atish P. [1 ]
Zhao, Huimin [1 ]
机构
[1] Univ Wisconsin Milwaukee, Sheldon B Lubar Sch Business, Milwaukee, WI USA
来源
JOURNAL OF THE ASSOCIATION FOR INFORMATION SYSTEMS | 2010年 / 11卷 / 09期
关键词
text mining; text classification; data mining; attribute selection; Internet abuse detection; INTERNET ABUSE; WEB USAGE; WORKPLACE; FILTER;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application of text mining in organizations is growing. Text classification, an important type of text mining problem, is characterized by a large attribute space and entails an efficient and effective attribute selection procedure. There are two general attribute selection approaches: the filter approach and the wrapper approach. While the wrapper approach is potentially more effective in finding the best attribute subset, it is cost-prohibitive in most text classification applications. In this paper, we propose a hybrid attribute selection approach that is both efficient and effective for text classification problems. We apply the proposed approach to detect and prevent Internet abuse in the workplace, which is becoming a major problem in modern organizations. The empirical evaluations we conducted using a variety of classification algorithms, indexing schemes, and attribute selection methods demonstrate the utility of the proposed approach. We found that combining the filter and wrapper approaches not only boosts the accuracies of text classifiers but also brings down the computational costs significantly.
引用
收藏
页码:491 / 519
页数:29
相关论文
共 67 条
[1]  
Anandarajan M, 2002, J MANAGE INFORM SYST, V19, P243
[2]  
ANANDARAJAN M, 2004, CONSTRUCTIVE DYSFUNC
[3]  
Anandarajan M., 2004, Personal web usage in the workplace: A guide to effective human resources management, P61, DOI [10.4018/978-1-59140-148-3.ch004, DOI 10.4018/978-1-59140-148-3.CH004]
[4]  
[Anonymous], 1997, ICML
[5]  
[Anonymous], 2014, C4. 5: programs for machine learning
[6]  
[Anonymous], 1998, LEARNING TEXT CATEGO
[7]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[8]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[9]  
Chakrabarti S., 2003, MINING WEB DISCOVERI
[10]   Web page classification based on a support vector machine using a weighted vote schema [J].
Chen, Rung-Ching ;
Hsieh, Chung-Hsun .
EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) :427-435