Short text classification using semantically enriched topic model

被引：1

作者：

Uddin, Farid ^{[1
]}

Chen, Yibo ^{[2
]}

Zhang, Zuping ^{[1
,3
]}

Huang, Xin ^{[2
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China

[2] State Grid Hunan Elect Power Co Ltd, Informat & Commun Branch, Changsha, Peoples R China

[3] Cent South Univ, Sch Comp Sci & Engn, 932 Lushan South Rd, Changsha 410083, Hunan, Peoples R China

来源：

JOURNAL OF INFORMATION SCIENCE | 2025年 / 51卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Machine learning; multi-level semantics; short text; text classification; topic model;

D O I：

10.1177/01655515241230793

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modelling short text is challenging due to the small number of word co-occurrence and insufficient semantic information that affects downstream Natural Language Processing (NLP) tasks, for example, text classification. Gathering information from external sources is expensive and may increase noise. For efficient short text classification without depending on external knowledge sources, we propose Expressive Short text Classification (EStC). EStC consists of a novel document context-aware semantically enriched topic model called the Short text Topic Model (StTM) that captures words, topics and documents semantics in a joint learning framework. In StTM, the probability of predicting a context word involves the topic distribution of word embeddings and the document vector as the global context, which obtains by weighted averaging of word embeddings on the fly simultaneously with the topic distribution of words without requiring an additional inference method for the document embedding. EStC represents documents in an expressive (number of topics x number of word embedding features) embedding space and uses a linear support vector machine (SVM) classifier for their classification. Experimental results demonstrate that EStC outperforms many state-of-the-art language models in short text classification using several publicly available short text data sets.

引用

页码：481 / 498

页数：18

共 57 条

[1] [Anonymous], 2008, P 17 INT C WORLD WID, DOI DOI 10.1145/1367497.1367510
[2] Arora S., 2017, P INT C LEARN REPR I, P1
[3] Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews
Bilal, Muhammad
Almazroi, Abdulwahab Ali
[J]. ELECTRONIC COMMERCE RESEARCH, 2023, 23 (04) : 2737 - 2757
[4] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[5] Bouma G.J., 2009, P GSCL, V30, P31, DOI DOI 10.1007/BF02774984
[6] Chaplot DS., 2018, 32 AAAI C ART INT NE
[7] Chen J., 2022, NEURAL COMPUT APPL, V32, P10809
[8] Chen JD, 2019, AAAI CONF ARTIF INTE, P6252
[9] Multiple weak supervision for short text classification
Chen, Li-Ming
Xiu, Bao-Xin
Ding, Zhao-Yun
[J]. APPLIED INTELLIGENCE, 2022, 52 (08) : 9101 - 9116
[10] Chen M., 2017, INT C LEARN REPR ICL

← 1 2 3 4 5 6 →