Spatio-Temporal Multiple Geo-Location Identification on Twitter

被引:0
作者
Ghoorchian, Kambiz [1 ]
Girdzijauskas, Sarunas [1 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci EECS, Stockholm, Sweden
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Geo-Location Identification; Graph Partitioning; Social Network Analysis; Spatio-Temporal Analysis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Twitter Geo-tags that indicate the exact location of messages have many applications from localized opinion mining during elections to efficient traffic management in critical situations. However, less than 6% of Tweets are Geo-tagged, which limits the implementation of those applications. There are two groups of solutions: content and network-based. The first group uses location indicative factors like URLs and topics, extracted from the content of tweets, to infer Geo-location for non geoactive users, whereas the second group benefits from friendship ties in the underlying social network graph. Friendship ties are better predictors compared to content information because they are less noisy and often follow the natural human spatial movement patterns. However, their prediction's accuracy is still limited because they ignore the temporal aspects of human behavior and always assume a single location per user. This research aims to extend the current network-based approaches by taking users' temporal dimension into account. We assume multiple locations per user during different time-slots and hypothesize that location predictability varies depending on the time and the properties of the social membership group. Thus, we propose a hierarchical solution to apply temporal categorizations on top of social network partitioning for multiple location prediction for users in Online Social Networks (OSNs) like Twitter. Given a largescale Twitter dataset, we show that users' location predictability exhibits different behavior in different time-slots and different social groups. We find that there are specific conditions where users are more predictable in terms of Geo-location. Our solution outperforms the state-of-the-art by improving the prediction accuracy by 16:6% in terms of Median Error Distance (MED) over the same recall.
引用
收藏
页码:3412 / 3421
页数:10
相关论文
共 30 条
  • [1] [Anonymous], 2012, PROC 5 ACM INT C WEB, DOI DOI 10.1145/2124295.2124380
  • [2] [Anonymous], 2010, EMNLP
  • [3] [Anonymous], 2011, P ACM SIGKDD INT C K, DOI DOI 10.1145/2020408.2020579
  • [4] [Anonymous], 2013, 24 ACM C HYP SOC MED, DOI [DOI 10.1145/2481492.2481494, 10.1145/2481492.2481494]
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Cheng Zhiyuan, 2010, PROC 19 ACM INT C IN, P759, DOI DOI 10.1145/1871437.1871535
  • [7] Clauset A, 2004, PHYS REV E, V70, DOI 10.1103/PhysRevE.70.066111
  • [8] Compton R., 2014, CORR
  • [9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [10] Goldenberg J., 2009, CoRR