Knowledge Based Dimensionality Reduction for Technical Text Mining

被引:0
|
作者
Shalaby, Walid [1 ]
Zadrozny, Wlodek [1 ]
Gallagher, Sean [1 ]
机构
[1] Univ North Carolina Charlotte, Dept Comp Sci, Charlotte, NC 28223 USA
关键词
Dimensionality Reduction; Feature Selection; Text Classification; Patent Classification; Knowledge Bases;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose a novel technique for dimensionality reduction using freely available online knowledge bases. The complexity of our method is linearly proportional to the size of the full feature set, making it applicable efficiently to huge and complex datasets. We demonstrate this approach by investigating its effectiveness on patent data, the largest free technical text. We report empirical results on classification of the CLEF-IP 2010 dataset using bigram features supported by mentions in Wikipedia, Wiktionary, and GoogleBooks knowledge bases. We achieve a 13-fold reduction in number of bigrams features and a 1.7% increase in classification accuracy over the unigrams baseline. These results give concrete evidence that significant accuracy improvements and massive reduction in dimensionality could be achieved using our approach, hence help alleviating the tradeoff between task complexity and accuracy.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Effective Pattern Discovery and Dimensionality Reduction for Text Under Text Mining
    Vijayakumar, T.
    Priya, R.
    Palanisamy, C.
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 2, 2015, 325 : 615 - 623
  • [2] A text mining-based approach for modelling technical knowledge evolution in patents
    Li G.
    Jiang Z.
    Li X.
    International Journal of Technology, Policy and Management, 2020, 20 (04) : 318 - 339
  • [3] An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition
    Kim, YS
    Chang, JH
    Zhang, BT
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 111 - 116
  • [4] Exploration of dimensionality reduction for text visualization
    Huang, SP
    Ward, MO
    Rundensteiner, EA
    THIRD INTERNATIONAL CONFERENCE ON COORDINATED & MULTIPLE VIEWS IN EXPLORATORY VISUALIZATION, PROCEEDINGS, 2005, : 63 - 74
  • [5] Dimensionality Reduction for Text Using LLE
    Chuan He
    Zhe Dong
    Li, Ruifan
    Zhong, Yixin
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 451 - 457
  • [6] Abstracting for Dimensionality Reduction in Text Classification
    McAllister, Richard A.
    Angryk, Rafal A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2013, 28 (02) : 115 - 138
  • [7] Self-attention based Text Knowledge Mining for Text Detection
    Wan, Qi
    Ji, Haoqin
    Shen, Linlin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5979 - 5988
  • [8] Semantic Text Deep Mining Based on knowledge element
    Wen, Youkui
    Wen, Hao
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL I, 2010, : 426 - 429
  • [9] Text Analysis and Knowledge Mining
    Nasukawa, Tetsuya
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 1 - 2
  • [10] Taxonomic Dimensionality Reduction in Bayesian Text Classification
    McAllister, Richard
    Sheppard, John
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 508 - 513