Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
来源
APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY | 2014年 / 513-517卷
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [41] A three-phase approach to document clustering based on topic significance degree
    Ma, Yinglong
    Wang, Yao
    Jin, Beihong
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (18) : 8203 - 8210
  • [42] WINDOW-BASED TOPIC MODEL FOR HDP
    Liu, Di
    Zeng, Ye
    Luo, Yu
    Pang, Hong
    Wu, Xiao-Hua
    2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 70 - 75
  • [43] A Phrase Topic Model Based on Distributed Representation
    Ma, Jialin
    Cheng, Jieyi
    Zhang, Lin
    Zhou, Lei
    Chen, Bolun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (01): : 455 - 469
  • [44] A phrase topic model based on distributed representation
    Ma J.
    Cheng J.
    Zhang L.
    Zhou L.
    Chen B.
    Computers, Materials and Continua, 2020, 64 (01): : 455 - 469
  • [45] Topic Model based Collaborative QoS Prediction
    Wu, Jian
    Ji, Lichuan
    Liang, Tingting
    Chen, Liang
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2014, 8 (05): : 2545 - 2555
  • [46] Dynamic scene analysis based on the topic model
    Fan, Yawen
    Zheng, Shibao
    2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 436 - 439
  • [47] Image Classification Based on Color Topic Model
    Kang, Nannan
    Wang, Xiaofang
    Zhang, Rongrong
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 4770 - 4773
  • [48] Paragraph Vector Based Topic Model for Language Model Adaptation
    Jin, Wengong
    He, Tianxing
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3516 - 3520
  • [49] A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
    Quang Vu Bui
    Sayadi, Karim
    Bui, Marc
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2016, 40 (02): : 169 - 180
  • [50] Topic Labeling Towards News Document Collection Based on Latent Dirichlet Allocation and Ontology
    Adhitama, Rifki
    Kusumaningrum, Retno
    Gernowo, Rahmat
    2017 1ST INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2017, : 247 - 251