Classification of Text Documents Based on a Probabilistic Topic Model

被引:0
|
作者
Karpovich, S. N. [1 ]
Smirnov, A. V. [2 ]
Teslya, N. N. [2 ]
机构
[1] Olymp Corp, Moscow 121205, Russia
[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia
基金
俄罗斯基础研究基金会;
关键词
classification; binary classification; topic modeling; natural language processing; SUPPORT;
D O I
10.3103/S0147688219050034
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [21] Service discovery for internet of things based on probabilistic topic model
    Wei, Qiang
    Jin, Zhi
    Xu, Yan
    Ruan Jian Xue Bao/Journal of Software, 2014, 25 (08): : 1640 - 1658
  • [22] A Tweet Classification Model Based on Dynamic and Static Component Topic Vectors
    Nand, Parma
    Perera, Rivindu
    Klette, Gisela
    AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 424 - 430
  • [23] Quarry Meaning: A Topic Model Application focused on Spanish Documents
    Acosta, Olga
    Aguilar, Cesar
    Araya, Fabiola
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 197 - 200
  • [24] Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification
    Ha Nguyen Thi Thu
    Tinh Dao Thanh
    Thanh Nguyen Hai
    Vinh Ho Ngoc
    2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1284 - 1288
  • [25] Genomic Sequence Classification Using Probabilistic Topic Modeling
    La Rosa, Massimo
    Fiannaca, Antonino
    Rizzo, Riccardo
    Urso, Alfonso
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS: 10TH INTERNATIONAL MEETING, 2014, 8452 : 49 - 61
  • [26] Semantic Based Text Classification of Patent Documents to a User-Defined Taxonomy
    Sureka, Ashish
    Mirajkar, Pranav Prabhakar
    Teli, Prasanna Nagesh
    Agarwal, Girish
    Bose, Sumit Kumar
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 644 - 651
  • [27] SF-CNN: Deep Text Classification and Retrieval for Text Documents
    Sarasu, R.
    Thyagharajan, K. K.
    Shanker, N. R.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02) : 1799 - 1813
  • [28] A probabilistic topic model based on short distance Co-occurrences
    Rahimi, Marziea
    Zahedi, Morteza
    Mashayekhi, Hoda
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
  • [29] Topic-focusing mechanism for speech recognition based on probabilistic grammar and topic-Markov model
    Kawabata, T
    SYSTEMS AND COMPUTERS IN JAPAN, 1995, 26 (13) : 75 - 82
  • [30] TABAS: Text augmentation based on attention score for text classification model
    Yu, Yeong Jae
    Yoon, Seung Joo
    Jun, So Young
    Kim, Jong Woo
    ICT EXPRESS, 2022, 8 (04): : 549 - 554