Active Learning for Text Classification: Using the LSI Subspace Signature Model

被引:0
|
作者
Zhu, Weizhong [1 ]
Allen, Robert B. [2 ]
机构
[1] City Hope Med Ctr, Los Angeles, CA USA
[2] Yonsei Univ, Dept Lib & Informat Sci, Seoul, South Korea
来源
2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2014年
关键词
active learning; classifiers; Latent Semantic Indexing Subspace Signature Model; text categorization; REGRESSION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [41] Impact of Batch Size on Stopping Active Learning for Text Classification
    Beatty, Garrett
    Kochis, Ethan
    Bloodgood, Michael
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 306 - 307
  • [42] Impact of Stop Sets on Stopping Active Learning for Text Classification
    Kurlandski, Luke
    Bloodgood, Michael
    16TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2022), 2022, : 25 - 32
  • [43] Active Learning Strategies for Multi-Label Text Classification
    Esuli, Andrea
    Sebastiani, Fabrizio
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 102 - +
  • [44] Text Classification Using Lifelong Machine Learning
    Arif, Muhammad Hassan
    Jin, Xin
    Li, Jianxin
    Iqbal, Muhammad
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 394 - 404
  • [45] Subspace Clustering with Active Learning
    Peng, Hankui
    Pavlidis, Nicos G.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 135 - 144
  • [46] Efficient and General Text Classification: An Active Learning Approach Using Active Learning and NLP to Aid Processes Such as Journalistic Investigations And document Analysis
    van Grinsven, Micha
    Brinkhuis, Matthieu
    Krempl, Georg
    Snijder, Joop
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II, 2025, 2134 : 105 - 120
  • [47] Embedded Prototype Subspace Classification: A Subspace Learning Framework
    Hast, Anders
    Lind, Mats
    Vats, Ekta
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 : 581 - 592
  • [48] One-Class Text Document Classification with OCSVM and LSI
    Kumar, B. Shravan
    Ravi, Vadlamani
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 597 - 606
  • [49] Chinese Text Classification Model Based on Deep Learning
    Li, Yue
    Wang, Xutao
    Xu, Pengjian
    FUTURE INTERNET, 2018, 10 (11):
  • [50] Text Classification of Mixed Model Based on Deep Learning
    Lee, Sang-Hwa
    TEHNICKI GLASNIK-TECHNICAL JOURNAL, 2023, 17 (03): : 367 - 374