A deep neural network model for Chinese toponym matching with geographic pre-training model

被引：2

作者：

Qiu, Qinjun ^{[1
,2
,3
,4
]}

Zheng, Shiyu ^{[2
]}

Tian, Miao ^{[1
]}

Li, Jiali ^{[2
]}

Ma, Kai ^{[3
]}

Tao, Liufeng ^{[2
,4
]}

Xie, Zhong ^{[2
,4
]}

机构：

[1] China Univ Geosci, Key Lab Geol Survey & Evaluat, Minist Educ, Wuhan 430074, Peoples R China

[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China

[3] China Three Gorges Univ, Hubei Key Lab Intelligent Vis Based Monitoring Hyd, Yichang 443002, Peoples R China

[4] Minist Nat Resources Key Lab Quantitat Resources A, Wuhan 430074, Peoples R China

来源：

INTERNATIONAL JOURNAL OF DIGITAL EARTH | 2024年 / 17卷 / 01期

基金：

国家重点研发计划;

关键词：

Toponym matching; deep learning; pre-training model; geographic information retrieval; natural language processing; RECOGNITION;

D O I：

10.1080/17538947.2024.2353111

中图分类号：

P9 [自然地理学];

学科分类号：

0705 ; 070501 ;

摘要：

Multiple tasks within the field of geographical information retrieval and geographical information sciences necessitate toponym matching, which involves the challenge of aligning toponyms that share a common referent. The multiple string similarity approaches struggle when confronted with the complexities associated with unofficial and/or historical variants of identical toponyms. Also, current state-of-the-art approaches/tools to supervised machine learning rely on labeled samples, and they do not adequately address the intricacies of character replacements either from transliterations or historical shifts in linguistic and cultural norms. To address these issues, this paper proposes a novel matching approach that leverages a deep neural network model empowered by geographic language representation model, known as GeoBERT, which stands for geographic Bidirectional Encoder Representations from Transformers (BERT). This model harnesses the groundbreaking capabilities of the GeoBERT framework by extending a generalized Enhanced Sequential Inference Model architecture and integrating multiple features to enhance the accuracy and robustness of the toponym matching. We present a comprehensive evaluation of the proposed method's performance using three extensive datasets. The findings clearly illustrate that our approach outperforms the individual similarity metrics used in previous studies.

引用

页数：24

共 61 条

[1] Similarities between Arabic dialects: Investigating geographical proximity
Alsudais, Abdulkareem
Alotaibi, Wafa
Alomary, Faye
[J]. INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[2] Pattern matching with address errors: Rearrangement distances
Amir, Amihood
Aumann, Yonatan
Benson, Gary
Levy, Avivit
Lipsky, Ohad
Porat, Ely
Skiena, Steven
Vishne, Uzi
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2009, 75 (06) : 359 - 370
[3] Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4] Berkhin Pavel, 2015, Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, P1
[5] BUCKLES B, 1994, PROCEEDINGS OF THE THIRD IEEE CONFERENCE ON FUZZY SYSTEMS - IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, VOLS I-III, P308, DOI 10.1109/FUZZY.1994.343627
[6] Cao SS, 2018, AAAI CONF ARTIF INTE, P5053
[7] Deep Contrast Learning Approach for Address Semantic Matching
Chen, Jian
Chen, Jianpeng
She, Xiangrong
Mao, Jian
Chen, Gang
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (16):
[8] Chen Q, 2017, Arxiv, DOI arXiv:1609.06038
[9] Cheng JP, 2016, Arxiv, DOI [arXiv:1601.06733, 10.48550/arXiv.1601.06733]
[10] Quickly locating POIs in large datasets from descriptions based on improved address matching and compact qualitative representations
Cheng, Ruozhen
Liao, Jiaxin
Chen, Jing
[J]. TRANSACTIONS IN GIS, 2022, 26 (01) : 129 - 154

← 1 2 3 4 5 6 7 →