Selecting article segment titles based on keyphrase features and semantic relatedness

被引:0
作者
Guo, Yuming [1 ]
Iwaihara, Mizuho [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Fukuoka, Fukuoka, Japan
来源
2018 7TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2018) | 2018年
关键词
titling documents; semantic relatedness; keyphrase extraction; document summarization;
D O I
10.1109/IIAI-AAI.2018.00034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.
引用
收藏
页码:129 / 132
页数:4
相关论文
共 15 条
  • [1] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [2] Danilevsky M., 2014, P 2014 SIAM INT C DA, P398
  • [3] DeWitt T, 2016, IEEE CONF COMPU INTE
  • [4] Han JW, 2000, SIGMOD RECORD, V29, P1
  • [5] Han X., 2015, DEIM 2015 7TH FORUM
  • [6] Hofmann Katja., 2009, CIKM, P1725
  • [7] Hu H., 2017, FINDING TITLES REPRE
  • [8] Hulpus I., 2013, P 6 ACM INT C WEB SE, P465
  • [9] Lau J.H., 2010, COLING 2010 POSTERS, P605
  • [10] Le Q, 2014, 31 INT C MACH LEARN