Classification of Text Documents Based on a Probabilistic Topic Model

被引：0

作者：

Karpovich, S. N. ^{[1
]}

Smirnov, A. V. ^{[2
]}

Teslya, N. N. ^{[2
]}

机构：

[1] Olymp Corp, Moscow 121205, Russia

[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia

来源：

SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING | 2019年 / 46卷 / 05期

基金：

俄罗斯基础研究基金会;

关键词：

classification; binary classification; topic modeling; natural language processing; SUPPORT;

D O I：

10.3103/S0147688219050034

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.

引用

页码：314 / 320

页数：7

共 50 条

[21] A Tweet Classification Model Based on Dynamic and Static Component Topic Vectors [J].

Nand, Parma ;

Perera, Rivindu ;

Klette, Gisela .

AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 :424-430

[22] Service discovery for internet of things based on probabilistic topic model [J].

Wei, Qiang ;

Jin, Zhi ;

Xu, Yan .

Ruan Jian Xue Bao/Journal of Software, 2014, 25 (08) :1640-1658

[23] Quarry Meaning: A Topic Model Application focused on Spanish Documents [J].

Acosta, Olga ;

Aguilar, Cesar ;

Araya, Fabiola .

PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61) :197-200

[24] A TOPIC EMBEDDINGS-BASED LSTM APPROACH FOR CHINESE LEGAL TEXT CLASSIFICATION [J].

Zhang, Yangwu ;

Li, Guohe ;

Cui, Lijie ;

Pu, Xiao ;

Bian, Lingyan ;

Shi, Lei .

JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2025, 26 (06) :1581-1592

[25] Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification [J].

Ha Nguyen Thi Thu ;

Tinh Dao Thanh ;

Thanh Nguyen Hai ;

Vinh Ho Ngoc .

2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, :1284-1288

[26] Genomic Sequence Classification Using Probabilistic Topic Modeling [J].

La Rosa, Massimo ;

Fiannaca, Antonino ;

Rizzo, Riccardo ;

Urso, Alfonso .

COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS: 10TH INTERNATIONAL MEETING, 2014, 8452 :49-61

[27] Semantic Based Text Classification of Patent Documents to a User-Defined Taxonomy [J].

Sureka, Ashish ;

Mirajkar, Pranav Prabhakar ;

Teli, Prasanna Nagesh ;

Agarwal, Girish ;

Bose, Sumit Kumar .

ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 :644-651

[28] A probabilistic topic model based on short distance Co-occurrences [J].

Rahimi, Marziea ;

Zahedi, Morteza ;

Mashayekhi, Hoda .

EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193

[29] SF-CNN: Deep Text Classification and Retrieval for Text Documents [J].

Sarasu, R. ;

Thyagharajan, K. K. ;

Shanker, N. R. .

INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02) :1799-1813

[30] Topic-focusing mechanism for speech recognition based on probabilistic grammar and topic-Markov model [J].

Kawabata, T .

SYSTEMS AND COMPUTERS IN JAPAN, 1995, 26 (13) :75-82

← 1 2 3 4 5 →