Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [21] Automatic text classification using BPLion-neural network and semantic word processing
    Ranjan, Nihar M.
    Prasad, Rajesh S.
    IMAGING SCIENCE JOURNAL, 2018, 66 (02): : 69 - 83
  • [22] Text Classification using Gated Fusion of n-gram Features and Semantic Features
    Nagar, Ajay
    Bhasin, Anmol
    Mathur, Gaurav
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
  • [24] Neural network agents for learning semantic text classification
    Wermter, S
    INFORMATION RETRIEVAL, 2000, 3 (02): : 87 - 103
  • [25] Neural Network Agents for Learning Semantic Text Classification
    Stefan Wermter
    Information Retrieval, 2000, 3 : 87 - 103
  • [26] Semantic similarity metric and its application in text classification
    Zhang, Pei-ying
    PROGRESS IN CIVIL ENGINEERING, PTS 1-4, 2012, 170-173 : 3711 - 3714
  • [27] Text Classification via Learning Semantic Dependency and Association
    Zhu, Guanqi
    Tao, Hanqing
    Wu, Han
    Chen, Liyi
    Liu, Ye
    Liu, Qi
    Chen, Enhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [28] Semantic and Morphological Information Guided Chinese Text Classification
    Song, Jiayu
    Xu, Qinghua
    Liu, Wei
    Zu, Yueran
    Chen, Mengdong
    MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 14 - 26
  • [29] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [30] THE APPLICATION OF LATENT SEMANTIC INDEXING AND ONTOLOGY IN TEXT CLASSIFICATION
    Yang, Xi-Quan
    Sun, Na
    Sun, Tie-Li
    Cao, Xue-Ya
    Zheng, Xiao-Juan
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4491 - 4499