Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [41] Emotionally charged text classification with deep learning and sentiment semantic
    Huan, Jeow Li
    Sekh, Arif Ahmed
    Quek, Chai
    Prasad, Dilip K.
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (03) : 2341 - 2351
  • [42] Semantic Text Classification for Supporting Automated Compliance Checking in Construction
    Salama, Dareen M.
    El-Gohary, Nora M.
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2016, 30 (01)
  • [43] Exploring semantic awareness via graph representation for text classification
    Li, Yahui
    Liu, Yifan
    Zhu, Zhenfang
    Liu, Peiyu
    APPLIED INTELLIGENCE, 2023, 53 (02) : 2088 - 2097
  • [44] Exploring semantic awareness via graph representation for text classification
    Yahui Li
    Yifan Liu
    Zhenfang Zhu
    Peiyu Liu
    Applied Intelligence, 2023, 53 : 2088 - 2097
  • [45] Emotionally charged text classification with deep learning and sentiment semantic
    Jeow Li Huan
    Arif Ahmed Sekh
    Chai Quek
    Dilip K. Prasad
    Neural Computing and Applications, 2022, 34 : 2341 - 2351
  • [46] A Novel Higher-Order Semantic Kernel for Text Classification
    Altinel, Berna
    Ganiz, Murat Can
    Diri, Banu
    2013 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2013, : 216 - 219
  • [47] A Multi-Granularity Semantic Extraction Method for Text Classification
    Li, Min
    Liu, Zeyu
    Li, Gang
    Han, Delong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 224 - 236
  • [48] Using grammars for text classification
    Kroha, P.
    Reichel, T.
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 259 - 264
  • [49] SRFW: A simple, fast and effective text classification algorithm
    Deng, ZH
    Tang, SW
    Yang, DQ
    Zhang, M
    Wu, XB
    Yang, M
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1267 - 1271
  • [50] Text classification using capsules
    Kim, Jaeyoung
    Jang, Sion
    Park, Eunjeong
    Choi, Sungchul
    NEUROCOMPUTING, 2020, 376 : 214 - 221