Vector-Based Similarity Measurements for Historical Figures

被引:6
作者
Chen, Yanqing [1 ]
Perozzi, Bryan [1 ]
Skiena, Steven [1 ]
机构
[1] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
来源
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2015 | 2015年 / 9371卷
关键词
Vector representations; People similarity; Deepwalk;
D O I
10.1007/978-3-319-25087-8_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.
引用
收藏
页码:179 / 190
页数:12
相关论文
共 18 条
  • [1] [Anonymous], 2013, P 17 C COMP NAT LANG
  • [2] [Anonymous], 2009, P 3 ACM C RECOMMENDE, DOI DOI 10.1145/1639714.1639726
  • [3] [Anonymous], 2014, P INT C INT C MACH L
  • [4] [Anonymous], ICML 2013 WORKSH DEE
  • [5] [Anonymous], 2014, PROC 20 ACM SIGKDD, DOI DOI 10.1145/2623330.2623732
  • [6] A neural probabilistic language model
    Bengio, Y
    Ducharme, R
    Vincent, P
    Jauvin, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [9] Elsayed T., 2008, P 46 ANN M ASS COMP
  • [10] Fellbaum C, 2010, THEORY AND APPLICATIONS OF ONTOLOGY: COMPUTER APPLICATIONS, P231, DOI 10.1007/978-90-481-8847-5_10