Contextual feature selection for text classification

被引:6
|
作者
Paradis, Francois [1 ]
Nie, Jian-Yun [1 ]
机构
[1] Univ Montreal, DIRO, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
classification; named entities; feature selection; text filtering;
D O I
10.1016/j.ipm.2006.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a simple approach for the classification of "noisy" documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for callfor tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant. (c) 2006 Published by Elsevier Ltd.
引用
收藏
页码:344 / 352
页数:9
相关论文
共 50 条
  • [41] Feature selection and text classification for Chinese web documents
    Xu, JC
    Liu, DY
    Hu, M
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
  • [42] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [43] Impact of Feature Selection and Engineering in the Classification of Handwritten Text
    Kaushik, Anupama
    Gupta, Himanshu
    Latwal, Digvijay Singh
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2598 - 2601
  • [44] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [45] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [46] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [47] Weighted Document Frequency for Feature Selection in Text Classification
    Li, Baoli
    Yan, Qiuling
    Xu, Zhenqiang
    Wang, Guicai
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 132 - 135
  • [48] Comparison of metrics for feature selection in imbalanced text classification
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 4978 - 4989
  • [49] The effects of globalisation techniques on feature selection for text classification
    Parlak, Bekir
    Uysal, Alper Kursat
    JOURNAL OF INFORMATION SCIENCE, 2021, 47 (06) : 727 - 739
  • [50] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96