Effective Seed-Guided Topic Labeling for Dataless Hierarchical Short Text Classification

被引:1
作者
Yang, Yi [1 ,3 ]
Wang, Hongan [1 ,3 ]
Zhu, Jiaqi [1 ,2 ,3 ]
Shi, Wandong [1 ,3 ]
Guo, Wenli [1 ]
Zhang, Jiawen [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Software, SKLCS, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
WEB ENGINEERING, ICWE 2021 | 2021年 / 12706卷
关键词
Hierarchical text classification; Topic model; Seed word;
D O I
10.1007/978-3-030-74296-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical text classification has a wide application prospect on the Internet, which aims to classify texts into a given hierarchy. Supervised methods require a large amount of labeled data and are thus costly. For this purpose, the task of dataless hierarchical text classification has attracted more and more attention of researchers in recent years, which only requires a few relevant seed words for given categories. However, existing approaches mainly focus on long texts without considering the characteristics of short texts, so are not suitable in many scenarios. In this paper, we tackle dataless hierarchical short text classification for the first time, and propose an innovative model named Hierarchical Seeded Biterm Topic Model (HierSeedBTM), which effectively leverages seed words in Biterm Topic Model (BTM) to guide the hierarchical topic labeling. Specifically, our model introduces iterative distribution propagation mechanism among topic models in different levels to incorporate the hierarchical structure information. Experiments on two public datasets show that the proposed model is more effective than the state-of-the-art methods of dataless hierarchical text classification designed for long texts.
引用
收藏
页码:271 / 285
页数:15
相关论文
共 30 条
[1]   Refined Experts Improving Classification in Large Taxonomies [J].
Bennett, Paul N. ;
Nguyen, Nam .
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, :11-18
[2]  
Chen WZ, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P489
[3]  
Chen XY, 2015, AAAI CONF ARTIF INTE, P2224
[4]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[5]  
Druck Gregory, 2008, SIGIR, P595
[6]  
Dumais S., 2000, SIGIR, P256
[7]  
Hao Chen, 2000, CHI 2000 Conference Proceedings. Conference on Human Factors in Computing Systems. CHI 2000. The Future is Here, P145, DOI 10.1145/332040.332418
[8]  
Jiang L, 2016, PROC INT C TOOLS ART, P865, DOI [10.1109/ICTAI.2016.0134, 10.1109/ICTAI.2016.131]
[9]  
Koller D., 1997, Technical report, DOI DOI 10.5555/645526.657130
[10]  
Lang K., 1995, P 12 INT C MACH LEAR, P331, DOI [10.1016/B978-1-55860-377-6.50048-7, DOI 10.1016/B978-1-55860-377-6.50048-7]