Knowledge Based Dimensionality Reduction for Technical Text Mining

被引:0
|
作者
Shalaby, Walid [1 ]
Zadrozny, Wlodek [1 ]
Gallagher, Sean [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Comp Sci, Charlotte, NC 28223 USA
关键词
Dimensionality Reduction; Feature Selection; Text Classification; Patent Classification; Knowledge Bases;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose a novel technique for dimensionality reduction using freely available online knowledge bases. The complexity of our method is linearly proportional to the size of the full feature set, making it applicable efficiently to huge and complex datasets. We demonstrate this approach by investigating its effectiveness on patent data, the largest free technical text. We report empirical results on classification of the CLEF-IP 2010 dataset using bigram features supported by mentions in Wikipedia, Wiktionary, and GoogleBooks knowledge bases. We achieve a 13-fold reduction in number of bigrams features and a 1.7% increase in classification accuracy over the unigrams baseline. These results give concrete evidence that significant accuracy improvements and massive reduction in dimensionality could be achieved using our approach, hence help alleviating the tradeoff between task complexity and accuracy.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] A comparison of dimensionality reduction techniques for text retrieval
    Vinay, V
    Cox, IJ
    Wood, K
    Milic-Frayling, N
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 293 - 298
  • [12] Enhancing text analysis via dimensionality reduction
    Underhill, David G.
    McDowell, Luke K.
    Marchette, David J.
    Solka, Jeffrey L.
    IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, : 348 - +
  • [13] Dimensionality reduction by semantic mapping in text categorization
    Corrêa, RF
    Ludermir, TB
    NEURAL INFORMATION PROCESSING, 2004, 3316 : 1032 - 1037
  • [14] Dimensionality Reduction by Mutual Information for Text Classification
    刘丽珍
    宋瀚涛
    陆玉昌
    Journal of Beijing Institute of Technology(English Edition), 2005, (01) : 32 - 36
  • [15] Dimensionality reduction framework for blog mining and visualisation
    Tsai, Flora S.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2012, 4 (03) : 267 - 285
  • [16] Data Dimensionality Reduction Framework for Data Mining
    Danubianu, M.
    Pentiuc, St Gh.
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (04) : 87 - 90
  • [17] Matrix dimensionality reduction for mining Web logs
    Lu, JJ
    Xu, BW
    Yang, HJ
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 405 - 408
  • [18] Dimensionality reduction in data mining: A Copula approach
    Houari, Rima
    Bounceur, Ahcene
    Kechadi, M-Tahar
    Tari, A-Kamel
    Euler, Reinhardt
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 64 : 247 - 260
  • [19] Knowledge-based selection of association rules for text mining
    Janetzko, D
    Cherfi, H
    Kenneke, R
    Napoli, A
    Toussaint, Y
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 485 - 489
  • [20] Knowledge Graph-based Algorithm for Text Data Mining
    Zhao, Yu-Feng
    He, Jie
    Journal of Network Intelligence, 2024, 9 (03): : 1892 - 1906