Classification of Keyphrases using Random Forest

被引:0
作者
Tovar Vidal, Mireya [1 ]
Flores Petlacalco, Gerardo [1 ]
Montes Rendon, Azucena [2 ]
Contreras Gonzalez, Meliza [1 ]
Cervantes Marquez, Ana Patricia [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Comp Sci, Puebla, Mexico
[2] Inst Tecnol Tlalpan, TecNM, Mexico City, DF, Mexico
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018) | 2018年
关键词
Keyphrases; Natural Language Processing; Machine Learning; Latent Semantic Analysis; LATENT SEMANTIC ANALYSIS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrases are words or phrases from a document that can describe its meaning. A keyphrase integrates the general idea of a document and implicitly contains the resources that the author used during the development of its research to achieve his goal. Therefore, there is a need to create classification models that allow the clustering of keyphrases according to their content for simplify reading. In this paper, keyphrases classification from scientific publications based on LSA and some classifying techniques is proposed and implemented. The aim is to create a classification model based on the extraction of features from the input corpus, without enriching it using external resources such as Wikipedia or online resources. Process, task, and material are the classes considered from Computer Science, Material Sciences, and Physics publications domains. Results show that Random Forest was found to be the best classification technique of keyphrases with 60% of measure-F-1.
引用
收藏
页码:506 / 511
页数:6
相关论文
共 25 条
  • [1] [Anonymous], An Introduction to Information Retrieval
  • [2] [Anonymous], 2014, Data mining with decision trees: theory and applications
  • [3] [Anonymous], 2017, P 11 INT WORKSH SEM, DOI 10.18653/v1/S17-2161
  • [4] [Anonymous], 2017, SEMEVAL ACL, DOI [10.18653/v1/S17-2091, DOI 10.18653/V1/S17-2091]
  • [5] Bellegarda JR, 1996, INT CONF ACOUST SPEE, P172, DOI 10.1109/ICASSP.1996.540318
  • [6] Eger S., 2017, P 11 INT WORKSH SEM, P942
  • [7] Gar>ia M. R., 2015, RES COMPUTING SCI, V94, P193
  • [8] Hearst MA, 1992, COLING 1992 VOLUME 2
  • [9] Horning N., 2010, RANDOM FORESTS ALGOR
  • [10] Hu FH, 2015, J INF SCI ENG, V31, P1133