A deep neural network model for Chinese toponym matching with geographic pre-training model

被引:2
作者
Qiu, Qinjun [1 ,2 ,3 ,4 ]
Zheng, Shiyu [2 ]
Tian, Miao [1 ]
Li, Jiali [2 ]
Ma, Kai [3 ]
Tao, Liufeng [2 ,4 ]
Xie, Zhong [2 ,4 ]
机构
[1] China Univ Geosci, Key Lab Geol Survey & Evaluat, Minist Educ, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[3] China Three Gorges Univ, Hubei Key Lab Intelligent Vis Based Monitoring Hyd, Yichang 443002, Peoples R China
[4] Minist Nat Resources Key Lab Quantitat Resources A, Wuhan 430074, Peoples R China
基金
国家重点研发计划;
关键词
Toponym matching; deep learning; pre-training model; geographic information retrieval; natural language processing; RECOGNITION;
D O I
10.1080/17538947.2024.2353111
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Multiple tasks within the field of geographical information retrieval and geographical information sciences necessitate toponym matching, which involves the challenge of aligning toponyms that share a common referent. The multiple string similarity approaches struggle when confronted with the complexities associated with unofficial and/or historical variants of identical toponyms. Also, current state-of-the-art approaches/tools to supervised machine learning rely on labeled samples, and they do not adequately address the intricacies of character replacements either from transliterations or historical shifts in linguistic and cultural norms. To address these issues, this paper proposes a novel matching approach that leverages a deep neural network model empowered by geographic language representation model, known as GeoBERT, which stands for geographic Bidirectional Encoder Representations from Transformers (BERT). This model harnesses the groundbreaking capabilities of the GeoBERT framework by extending a generalized Enhanced Sequential Inference Model architecture and integrating multiple features to enhance the accuracy and robustness of the toponym matching. We present a comprehensive evaluation of the proposed method's performance using three extensive datasets. The findings clearly illustrate that our approach outperforms the individual similarity metrics used in previous studies.
引用
收藏
页数:24
相关论文
共 61 条
  • [1] Similarities between Arabic dialects: Investigating geographical proximity
    Alsudais, Abdulkareem
    Alotaibi, Wafa
    Alomary, Faye
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
  • [2] Pattern matching with address errors: Rearrangement distances
    Amir, Amihood
    Aumann, Yonatan
    Benson, Gary
    Levy, Avivit
    Lipsky, Ohad
    Porat, Ely
    Skiena, Steven
    Vishne, Uzi
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2009, 75 (06) : 359 - 370
  • [3] Bergstra J, 2012, J MACH LEARN RES, V13, P281
  • [4] Berkhin Pavel, 2015, Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, P1
  • [5] BUCKLES B, 1994, PROCEEDINGS OF THE THIRD IEEE CONFERENCE ON FUZZY SYSTEMS - IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, VOLS I-III, P308, DOI 10.1109/FUZZY.1994.343627
  • [6] Cao SS, 2018, AAAI CONF ARTIF INTE, P5053
  • [7] Deep Contrast Learning Approach for Address Semantic Matching
    Chen, Jian
    Chen, Jianpeng
    She, Xiangrong
    Mao, Jian
    Chen, Gang
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [8] Chen Q, 2017, Arxiv, DOI arXiv:1609.06038
  • [9] Cheng JP, 2016, Arxiv, DOI [arXiv:1601.06733, 10.48550/arXiv.1601.06733]
  • [10] Quickly locating POIs in large datasets from descriptions based on improved address matching and compact qualitative representations
    Cheng, Ruozhen
    Liao, Jiaxin
    Chen, Jing
    [J]. TRANSACTIONS IN GIS, 2022, 26 (01) : 129 - 154