Home Location Identification of Twitter Users

被引:89
作者
Mahmud, Jalal
Nichols, Jeffrey
Drews, Clemens
机构
[1] IBM Research, 650 Harry Rd, San Jose
关键词
Algorithms; Design; Experimentation; Human Factors; Location; tweets; time Zone;
D O I
10.1145/2528548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone, or geographic region, using the content of users' tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities. We find that a hierarchical classification approach, where time zone, state, or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time, and use that to further improve the location detection accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the home location of Twitter users.
引用
收藏
页数:21
相关论文
共 34 条
[21]  
Gao H, 2012, P INT AAAI C WEB SOC
[22]  
Hecht B, 2011, 29TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, P237
[23]  
Jimenez D, 1998, IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, P753, DOI 10.1109/IJCNN.1998.682375
[24]  
Kinsella S., 2011, P 3 INT WORKSH SEARC, P61, DOI DOI 10.1145/2065023.2065039
[25]  
Lampos V, 2010, LECT NOTES ARTIF INT, V6323, P599, DOI 10.1007/978-3-642-15939-8_42
[26]  
Li W., 2011, Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, P2473
[27]  
Lieberman M.D., 2009, Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, P106
[28]  
Mahmud Jalal., 2012, P 6 INT AAAI C WEBLO, P511
[29]  
Phelan Owen., 2009, P 3 ACM C RECOMMENDE, P385, DOI [10.1145/1639714.1639794, DOI 10.1145/1639714.1639794]
[30]  
Popescu A., 2010, Proceedings of the Fourth International AAAI Conference on weblogs and Social Media, P307