A customizable text classifier for text mining

被引:0
作者
Zhang, Yun-Liang [1 ]
Zhang, Quan [2 ]
机构
[1] Institute of Acoustics, Graduate School, Chinese Academy of Sciences
[2] Institute of Acoustics, Chinese Academy of Sciences
关键词
Nature Language Processing (NPL); Text categorization; Text mining;
D O I
10.2481/dsj.6.S904
中图分类号
学科分类号
摘要
Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
引用
收藏
页码:S904 / S909
页数:5
相关论文
共 20 条
[1]  
Aas K., Eikvil A., Text categorization: A survey, Norwegian computing center technical report, (1999)
[2]  
Chen X., Li R., Using maximum entropy model for text categorization, Computer Engineering and Applications 40(35), 78-79, (2004)
[3]  
Cheng Z., Lin S., Methods on Accuracy Evaluation of Text Classifier, Journal of the China Society for Scientific and Technical Information, 23, 5, pp. 631-636, (2004)
[4]  
Hearst M., Untangling Text Data Mining, Proceedings of ACL'99: The 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, (1999)
[5]  
Huang H., Lin S., Et al., A study of text categorization on Concept Space, Computer Science, 30, 3, pp. 46-49, (2003)
[6]  
Huang Z., HNC Theory, (1998)
[7]  
Jin Y., Language Processing Techniques and Applications Based on HNC Theory, (2006)
[8]  
Miao C., Studies on the Knowledge of Sentence Category in HNC Theory, (2001)
[9]  
Pang J., Bu D., Et al., Research and implementation of text categorization system based on VSM, Application Research of Computers, 18, 9, pp. 23-26, (2001)
[10]  
Salton G., Lesk M., Computer evaluation of indexing and text processing, Journal of the ACM, 15, 1, pp. 8-36, (1968)