Active Learning for Text Classification: Using the LSI Subspace Signature Model

被引:0
|
作者
Zhu, Weizhong [1 ]
Allen, Robert B. [2 ]
机构
[1] City Hope Med Ctr, Los Angeles, CA USA
[2] Yonsei Univ, Dept Lib & Informat Sci, Seoul, South Korea
来源
2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2014年
关键词
active learning; classifiers; Latent Semantic Indexing Subspace Signature Model; text categorization; REGRESSION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [21] Active Learning for Classification with Maximum Model Change
    Cai, Wenbin
    Zhang, Yexun
    Zhang, Ya
    Zhou, Siyuan
    Wang, Wenquan
    Chen, Zhuoxiang
    Ding, Chris
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (02)
  • [22] Active Learning for Cost-Sensitive Classification Using Logistic Regression Model
    Zhou, Siyuan
    Zhang, Ya
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 284 - 287
  • [23] SUBSPACE LEARNING BASED ACTIVE LEARNING FOR IMAGE RETRIEVAL
    Niu, Biao
    Zhang, Yifan
    Wang, Jinqiao
    Cheng, Jian
    Lu, Hanqing
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [24] Exploring Uncertain Samples through Active Learning To Enhance Text Emotion Classification
    Dou, Rongyu
    Shun, Nishide
    Ren, Fuji
    Kang, Xin
    PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 26 - 30
  • [25] Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions
    Flores, Christopher A.
    Figueroa, Rosa L.
    Pezoa, Jorge E.
    IEEE ACCESS, 2021, 9 : 38767 - 38777
  • [26] Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification
    Fromme, Lukas
    Mirylenka, Katsiaryna
    Kuhn, Jonas
    Bogojeska, Jasmina
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4597 - 4605
  • [27] Scalability of Continuous Active Learning for Reliable High-Recall Text Classification
    Cormack, Gordon V.
    Grossman, Maura R.
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1039 - 1048
  • [28] A Classification Methodology based on Subspace Graphs Learning
    La Grassa, Riccardo
    Gallo, Ignazio
    Calefati, Alessandro
    Ognibene, Dimitri
    2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 33 - 40
  • [29] A comparative study of TF*IDF, LSI and multi-words for text classification
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2758 - 2765
  • [30] Transformer-based active learning for multi-class text annotation and classification
    Afzal, Muhammad
    Hussain, Jamil
    Abbas, Asim
    Hussain, Maqbool
    Attique, Muhammad
    Lee, Sungyoung
    DIGITAL HEALTH, 2024, 10