A fast algorithm for hierarchical text classification

被引:0
作者
Chuang, WT [1 ]
Tiyyagura, A
Yang, J
Giuffrida, G
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[3] HRL Labs LLC, Malibu, CA 90265 USA
来源
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS | 2000年 / 1874卷
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text classification is becoming more important with the proliferation of the Internet and the huge amount of data it transfers. We present an efficient algorithm for text classification using hierarchical classifiers based on a concept hierarchy. The simple TFIDF classifier is chosen to train sample data and to classify other new data. Despite its simplicity, results of experiments on Web pages and TV closed captions demonstrate high classification accuracy. Application of feature subset selection techniques improves the performance. Our algorithm is computationally efficient being bounded by O(n log n) for n samples.
引用
收藏
页码:409 / 418
页数:10
相关论文
共 11 条
  • [1] [Anonymous], P 23 VLDB C
  • [2] CRAVEN M, 1998, P 15 C ART INT
  • [3] KORFHAGE RR, 1997, INFORMATION STORAGE
  • [4] MCCALLUM A, 1999, AAAI99 SPRING S INT
  • [5] Mitchell T., 1997, Machine Learning, V7, P2
  • [6] Text-learning and related intelligent agents: A survey
    Mladenic, D
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (04): : 44 - 54
  • [7] MLADENIC D, 1998, WORKING NOTES LEARNI
  • [8] SAHAMI M, 1998, THESIS STANFORD U
  • [9] Salton G., 1988, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
  • [10] Deriving concept hierarchies from text
    Sanderson, M
    Croft, B
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 206 - 213