Text-Based Twitter User Geolocation Prediction

被引:158
作者
Han, Bo [1 ,2 ]
Cook, Paul [1 ]
Baldwin, Timothy [1 ,2 ]
机构
[1] Univ Melbourne, Melbourne, Vic 3010, Australia
[2] NICTA Victoria Res Lab, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
NETWORK;
D O I
10.1613/jair.4200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author's location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain "location indicative words". We then evaluate the impact of nongeotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems.
引用
收藏
页码:451 / 500
页数:50
相关论文
共 74 条
[1]  
Ahmed A., 2013, Proceedings of World Wide Web Conference, P25
[2]  
Amitay E., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P273, DOI 10.1145/1008992.1009040
[3]  
[Anonymous], 2009, P 18 INT C WORLD WID
[4]  
[Anonymous], 2012, PROC 5 ACM INT C WEB, DOI DOI 10.1145/2124295.2124380
[5]  
[Anonymous], 2009, P 18 INT C WORLD WID
[6]  
[Anonymous], 2010, P 18 ACM SIGSPATIAL
[7]  
[Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772698
[8]  
[Anonymous], 1997, ICML
[9]  
[Anonymous], 2013, Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013
[10]  
[Anonymous], 2013, 24 ACM C HYP SOC MED, DOI [DOI 10.1145/2481492.2481494, 10.1145/2481492.2481494]