Classification of Text Documents Based on a Probabilistic Topic Model

被引:0
|
作者
Karpovich, S. N. [1 ]
Smirnov, A. V. [2 ]
Teslya, N. N. [2 ]
机构
[1] Olymp Corp, Moscow 121205, Russia
[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia
基金
俄罗斯基础研究基金会;
关键词
classification; binary classification; topic modeling; natural language processing; SUPPORT;
D O I
10.3103/S0147688219050034
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [1] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [2] A classification-based summarisation model for summarising text documents
    Hannah, M.E. (hanmoses@yahoo.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (06): : 292 - 308
  • [3] MULTILAYER ADAPTIVE FUZZY PROBABILISTIC NEURAL NETWORK IN CLASSIFICATION PROBLEMS OF TEXT DOCUMENTS
    Bodyanskiy, E. V.
    Ryabova, N. V.
    Zolotukhin, O. V.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2015, 1 : 39 - 45
  • [4] TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information
    Voskergian, Daniel
    Bakir-Gungor, Burcu
    Yousef, Malik
    FRONTIERS IN GENETICS, 2023, 14
  • [5] Hierarchical Method for Automated Text Documents Classification
    Mousa, Mohamed H.
    Khedr, Ayman E.
    Idrees, Amira M.
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2025, 22 (01) : 11 - 19
  • [6] Text classification method based on self-training and LDA topic models
    Pavlinek, Miha
    Podgorelec, Vili
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 80 : 83 - 93
  • [7] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844
  • [8] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [9] A probabilistic model for classification of multiple-record Web documents
    Tang, J
    Ng, YK
    OOIS 2000: 6TH INTERNATIONAL CONFERENCE ON OBJECT ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 349 - 357
  • [10] A probabilistic topic model for event-based image classification and multi-label annotation
    Laib, Lakhdar
    Allili, Mohand Said
    Ait-Aoudia, Samy
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2019, 76 : 283 - 294