Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [31] Text Classification via Learning Semantic Dependency and Association
    Zhu, Guanqi
    Tao, Hanqing
    Wu, Han
    Chen, Liyi
    Liu, Ye
    Liu, Qi
    Chen, Enhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [32] Semantic similarity metric and its application in text classification
    Zhang, Pei-ying
    PROGRESS IN CIVIL ENGINEERING, PTS 1-4, 2012, 170-173 : 3711 - 3714
  • [34] Text Classification using Gated Fusion of n-gram Features and Semantic Features
    Nagar, Ajay
    Bhasin, Anmol
    Mathur, Gaurav
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
  • [35] Explicit Semantic Analysis as a Means for Topic Labelling
    Kriukova, Anna
    Erofeeva, Aliia
    Mitrofanova, Olga
    Sukharev, Kirill
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE (AINL 2018), 2018, 930 : 110 - 116
  • [36] Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis
    Swieboda, Wojciech
    Krasuski, Adam
    Hung Son Nguyen
    Janusz, Andrzej
    FUNDAMENTA INFORMATICAE, 2014, 132 (03) : 423 - 438
  • [37] Extended Explicit Semantic Analysis for Calculating Semantic Relatedness of Web Resources
    Scholl, Philipp
    Boehnstedt, Doreen
    Garcia, Renato Dominguez
    Rensing, Christoph
    Steinmetz, Ralf
    SUSTAINING TEL: FROM INNOVATION TO LEARNING AND PRACTICE, 2010, 6383 : 324 - 339
  • [38] A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
    Lin, Mu
    Wang, Tao
    Zhu, Yifan
    Li, Xiaobo
    Zhou, Xin
    Wang, Weiping
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [39] A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary
    Zhu, Xinhua
    Xu, Qingting
    Chen, Yishan
    Chen, Hongchao
    Wu, Tianjun
    IEEE ACCESS, 2020, 8 : 24990 - 25000
  • [40] A Novel Feature Selection Method Based on Probability Latent Semantic Analysis for Chinese Text Classification
    Zhong Jiang
    Sun Qigan
    Li Xue
    Wen Luosheng
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (02): : 228 - 232