Quarry Meaning: A Topic Model Application focused on Spanish Documents

被引:0
|
作者
Acosta, Olga [1 ]
Aguilar, Cesar [1 ]
Araya, Fabiola [1 ]
机构
[1] Pontificia Univ Catolica Chile, Fac Letras, Campus San Joaquin, Santiago, Chile
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2018年 / 61期
关键词
Natural language processing; text mining; topic modeling; contrastive approach; text classification;
D O I
10.26342/2018-61-31
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This demo shows a standalone application that allows to easily train and test a topic model. The application includes filters for reducing noise in the results. On the one hand, a base stop-list is included, but it can be complemented with a non-relevant word list proposed by user, or obtained it by means of a contrastive approach using a reference corpus. On the other hand, words having a high semantic value can be considered using POS tags. We also include a visualization in word-clouds way, where ten topics can be shown, in order to analyze in detail the results. Finally, evaluation was carried out focusing topic model for classifying documents. Our model achieved levels of precision above 95% in the test set.
引用
收藏
页码:197 / 200
页数:4
相关论文
共 8 条
  • [1] Classification of Text Documents Based on a Probabilistic Topic Model
    Karpovich, S. N.
    Smirnov, A. V.
    Teslya, N. N.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2019, 46 (05) : 314 - 320
  • [2] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [3] Best Setting of Model Parameters in Applying Topic Modeling on Textual Documents
    Zou, Wen
    Zhao, Weizhong
    Chen, James J.
    Perkins, Roger
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 588 - 588
  • [4] The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text
    Lin, Tianyi
    Tian, Wentao
    Mei, Qiaozhu
    Cheng, Hong
    WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 539 - 549
  • [5] Evaluating the effectiveness of VSM model and topic segmentation in retrieving arabic documents
    Harrag, Fouzi
    Hamdi-Cherif, Aboubekeur
    Al-Salman, Abdul Malik S.
    El-Qawasmeh, Eyas
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2011, 26 (01): : 59 - 71
  • [6] LF-LDA: A Supervised Topic Model for Multi-Label Documents Classification
    Zhang, Yongjun
    Wang, Zijian
    Yu, Yongtao
    Chen, Bolun
    Ma, Jialin
    Shi, Liang
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2018, 14 (02) : 18 - 36
  • [7] Research on Hotspots of Educational Application of Natural Language Processing Based on LDA Topic Model
    Wang, Meng
    Xie, Yuyang
    Tian, Yu
    CHINESE LEXICAL SEMANTICS, CLSW 2022, PT II, 2023, 13496 : 315 - 325
  • [8] Breaking the Validation Trade-off in Topic Extraction: A Bi-Objective Metaheuristic Model for Labelling Short-Text Clusters and an Application on AirBnB Tokyo Reviews
    Unver, Mustafa
    2024 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND HUMAN-COMPUTER INTERACTION, MLHMI 2024, 2024, : 37 - 41