Ontology-driven discovery of geospatial evidence in web pages

被引:10
作者
Borges, Karla A. V. [1 ,2 ]
Davis, Clodoveu A., Jr. [1 ]
Laender, Alberto H. F. [1 ]
Medeiros, Claudia Bauzer [3 ]
机构
[1] Univ Fed Minas Gerais, Dept Ciencia Comp, BR-31270010 Belo Horizonte, MG, Brazil
[2] PRODABEL Empresa Informat & Informacao Municipio, BR-31230000 Belo Horizonte, MG, Brazil
[3] Univ Estadual Campinas, Inst Informat, BR-13083970 Campinas, SP, Brazil
关键词
Geographic information retrieval; Extraction ontologies; Geospatial evidence in text; Positioning expressions; Geocoding; POINT; MODEL;
D O I
10.1007/s10707-010-0118-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.
引用
收藏
页码:609 / 631
页数:23
相关论文
共 50 条
[1]  
Aho Alfred V., 1990, Handbook of Theoretical Computer Science, P255, DOI DOI 10.1016/B978-0-444-88071-0.50010-2
[2]  
Amitay E., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P273, DOI 10.1145/1008992.1009040
[3]   Web-based delineation of imprecise regions [J].
Arampatzis, Avi ;
van Kreveld, Marc ;
Reinbacher, Iris ;
Jones, Christopher B. ;
Vaid, Subodh ;
Clough, Paul ;
Joho, Hideo ;
Sanderson, Mark .
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2006, 30 (04) :436-459
[4]  
Borges K.A. V., 2007, P 4 ACM WORKSHOP GEO, P31
[5]   OMT-G: An object-oriented data model for geographic applications [J].
Borges, KAV ;
Davis, CA ;
Laender, AHF .
GEOINFORMATICA, 2001, 5 (03) :221-260
[6]  
BORGES KAV, 2003, P 5 BRAZ S GEOINF CA
[7]  
BORGES KAV, 2006, USE ONTOLOGY URBAN P
[8]  
Buneman P, 2000, LECT NOTES COMPUT SC, V1974, P87
[9]   THE STRUCTURE OF SPATIAL LOCALIZATION [J].
CASATI, R ;
VARZI, AC .
PHILOSOPHICAL STUDIES, 1996, 82 (02) :205-239
[10]  
Clementini E., 1993, Advances in Spatial Databases. Third International Symposium, SSD '93 Proceedings, P277