Relation between Titles and Keywords in Japanese Academic Papers using Quantitative Analysis and Machine Learning

被引:0
作者
Murata, Masaki [1 ]
Morimoto, Natsumi [1 ]
机构
[1] Tottori Univ, Fac Engn, Tottori, Japan
来源
COMPUTACION Y SISTEMAS | 2019年 / 23卷 / 03期
关键词
Thesis; title; keyword; machine learning; feature analysis;
D O I
10.13053/CyS-23-3-3255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we analyzed keywords from different academic papers using data from more than 300 papers. Using the concept of quantitative surveys and machine learning, we conducted various analyses on the keywords in different papers. The findings obtained from these surveys and analyses are assumed to lend themselves to the automatic assignment of keywords for papers. In this study, the number of keywords included in a paper is quantitatively expressed using the covering rate and density of keywords. The results confirm that paper titles are likely to include keywords. The performed keyword analyses predict words that can be used as keywords via machine learning. The proposed method has an accuracy range 0.6-0.8. In addition, by analyzing the features used in machine learning, we can obtain the characteristics of the words that are mentioned as keywords in papers.
引用
收藏
页码:959 / 968
页数:10
相关论文
共 8 条
  • [1] Bhowmik R, 2008, PROCEEDINGS IEEE SOUTHEASTCON 2008, VOLS 1 AND 2, P610
  • [2] Kurohashi S., 1997, SYSTEM DOCUMENT RETR, P27
  • [3] Murata M, 2003, LECT NOTES COMPUT SC, V2588, P115
  • [4] Using the Maximum Entropy Method for Natural Language Processing: Category Estimation, Feature Extraction, and Error Correction
    Murata, Masaki
    Uchimoto, Kiyotaka
    Utiyama, Masao
    Ma, Qing
    Nishimura, Ryo
    Watanabe, Yasuhiko
    Doi, Kouichi
    Torisawa, Kentaro
    [J]. COGNITIVE COMPUTATION, 2010, 2 (04) : 272 - 279
  • [5] Nagao M., 1992, INFORM SCI TECHNOLOG, V42, P711
  • [6] NLRI, 1964, BUNR GOI HYOU
  • [7] Utiyama M., 2000, Terminology, V6, P313
  • [8] Zhang CZ, 2009, LECT NOTES COMPUT SC, V5678, P745, DOI 10.1007/978-3-642-03348-3_79