A quantitative analysis of global gazetteers: Patterns of coverage for common feature types

被引:38
作者
Acheson, Elise [1 ]
De Sabbata, Stefano [2 ]
Purves, Ross S. [1 ]
机构
[1] Univ Zurich, Dept Geog, Winterthurerstr 190, CH-8057 Zurich, Switzerland
[2] Univ Leicester, Dept Geog, Univ Rd, Leicester LE1 7RH, Leics, England
关键词
Gazetteers; Data quality; GeoNames; Placenames; Geocoding; GEOGRAPHICAL INFORMATION; RETRIEVAL;
D O I
10.1016/j.compenvurbsys.2017.03.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities. (C) 2017 The Author. Published by Elsevier Ltd.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 49 条
[1]  
Acheson E., 2016, P 24 GIS RES UK C
[2]   Voronoi-based region approximation for geographical information retrieval with gazetteers [J].
Alani, H ;
Jones, CB ;
Tudhope, D .
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2001, 15 (04) :287-306
[3]  
Amitay E., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P273, DOI 10.1145/1008992.1009040
[4]  
[Anonymous], 2006, 14 ANN ACM INT S ADV
[5]  
[Anonymous], 2013, P 7 WORKSHOP GEOGRAP, DOI DOI 10.1145/2533888.2533938
[6]  
Bégin D, 2013, INT ARCH PHOTOGRAMM, V40-2, P149
[7]   Language and landscape: a cross-linguistic perspective [J].
Burenhult, Niclas ;
Levinson, Stephen C. .
LANGUAGE SCIENCES, 2008, 30 (2-3) :135-150
[8]  
Buscaldi D., 2011, Sigspatial Special, V3, P16, DOI 10.1145/2047296.2047300
[9]  
Campbell J. C., 1991, NAMES, V39, P333, DOI [10.1179/nam.1991.39.4.333, DOI 10.1179/NAM.1991.39.4.333]
[10]   Mapping the English Lake District: a literary GIS [J].
Cooper, David ;
Gregory, Ian N. .
TRANSACTIONS OF THE INSTITUTE OF BRITISH GEOGRAPHERS, 2011, 36 (01) :89-108