A quantitative analysis of global gazetteers: Patterns of coverage for common feature types

被引:38
作者
Acheson, Elise [1 ]
De Sabbata, Stefano [2 ]
Purves, Ross S. [1 ]
机构
[1] Univ Zurich, Dept Geog, Winterthurerstr 190, CH-8057 Zurich, Switzerland
[2] Univ Leicester, Dept Geog, Univ Rd, Leicester LE1 7RH, Leics, England
关键词
Gazetteers; Data quality; GeoNames; Placenames; Geocoding; GEOGRAPHICAL INFORMATION; RETRIEVAL;
D O I
10.1016/j.compenvurbsys.2017.03.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities. (C) 2017 The Author. Published by Elsevier Ltd.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 49 条
[11]   Gazetteer enrichment for addressing urban areas: a case study [J].
de Oliveira, Maxwell Guimaraes ;
Campelo, Claudio E. C. ;
Baptista, Claudio de Souza ;
Bertolotto, Michela .
JOURNAL OF LOCATION BASED SERVICES, 2016, 10 (02) :142-159
[12]  
Dredze Mark, 2013, AAAI WORKSH EXP BOUN
[13]  
Fu G., 2005, Names, pages, P167
[14]   Constructing gazetteers from volunteered Big Geo-Data based on Hadoop [J].
Gao, Song ;
Li, Linna ;
Li, Wenwen ;
Janowicz, Krzysztof ;
Zhang, Yue .
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 61 :172-186
[15]  
Gelernter J., 2014, J SPATIAL INFORM SCI, V9
[16]   Citizens as sensors: the world of volunteered geography [J].
Goodchild, Michael .
GEOJOURNAL, 2007, 69 (04) :211-221
[17]   Assuring the quality of volunteered geographic information [J].
Goodchild, Michael F. ;
Li, Linna .
SPATIAL STATISTICS, 2012, 1 :110-120
[18]   Towards a study of information geographies: (im)mutable augmentations and a mapping of the geographies of information [J].
Graham, Mark ;
De Sabbata, Stefano ;
Zook, Matthew A. .
GEO-GEOGRAPHY AND ENVIRONMENT, 2015, 2 (01) :88-105
[19]   Mapping information wealth and poverty: the geography of gazetteers [J].
Graham, Mark ;
De Sabbata, Stefano .
ENVIRONMENT AND PLANNING A-ECONOMY AND SPACE, 2015, 47 (06) :1254-1264
[20]  
Grossner K., 2016, PLACING NAMES ENRICH