Home Location Identification of Twitter Users

被引:89
作者
Mahmud, Jalal
Nichols, Jeffrey
Drews, Clemens
机构
[1] IBM Research, 650 Harry Rd, San Jose
关键词
Algorithms; Design; Experimentation; Human Factors; Location; tweets; time Zone;
D O I
10.1145/2528548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone, or geographic region, using the content of users' tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities. We find that a hierarchical classification approach, where time zone, state, or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time, and use that to further improve the location detection accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the home location of Twitter users.
引用
收藏
页数:21
相关论文
共 34 条
[1]  
Adams B., 2012, ICWSM, P375, DOI DOI 10.1094/PDIS-11-11-0999-PDN
[2]  
Agarwal P., 2012, PROC 5 INT C WEBLOGS, P379
[3]  
Amitay E., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P273, DOI 10.1145/1008992.1009040
[4]  
[Anonymous], 2012, PROC 5 ACM INT C WEB, DOI DOI 10.1145/2124295.2124380
[5]  
[Anonymous], 2010, EMNLP
[6]  
[Anonymous], 2012, P 6 INT AAAI C WEB S
[7]  
[Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772698
[8]  
[Anonymous], MULTIPLE CLASSIFIER
[9]  
Bernstein Michael S., 2010, Proceedings of the ACM symposium on user interface software and technology, P303
[10]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350