Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

被引:0
作者
Kim, Minyoung [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Elect & IT Media Engn, Seoul, South Korea
关键词
Machine learning; Document/text classification; Term weighting; Optimization;
D O I
10.5391/IJFIS.2016.16.2.81
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.
引用
收藏
页码:81 / 86
页数:6
相关论文
共 12 条
  • [1] [Anonymous], 2011, P 49 ANN M ASS COMP
  • [2] Bertsekas D. P., 1999, NONLINEAR PROGRAMMIN, V2nd
  • [3] Chy A., 2014, INT C COMP INF TECHN
  • [4] Crammer K., 2001, J MACHINE LEARNING R, V2, P2001
  • [5] Debole F., 2003, P ACM S APPL COMP
  • [6] Deng ZH, 2004, LECT NOTES COMPUT SC, V3007, P588
  • [7] Joachims T., 1998, EUR C MACH LEARN
  • [8] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [9] AN ALGORITHM FOR SUFFIX STRIPPING
    PORTER, MF
    [J]. PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS, 1980, 14 (03): : 130 - 137
  • [10] Sahami M., 1998, P 21 NAT C ART INT A