Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching

被引:10
作者
Li, Fangfang [1 ]
Lu, Yiheng [1 ]
Mao, Xingliang [2 ]
Duan, Junwen [1 ]
Liu, Xiyao [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China
[2] Hunan Univ Technol & Business, Inst Big Data & Internet Innovat, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
Address matching; Multi-task learning; Recognizes the address elements; Hierarchical relations between address elements; RECOGNITION; WORD2VEC;
D O I
10.1007/s00521-022-06914-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Address matching, which aims to match unstructured addresses with standard addresses in an address database, is a key part of geocoding. The core problem of address matching corresponds to text matching in natural language processing. Existing rule-based methods require human-designed templates and thus, have limited applicability. Machine learning and deep learning-based methods ignore the hierarchical relations between address elements, which easily misclassify semantically similar but geographically different locations. We note that the hierarchy of address elements can fill the semantic gap in address matching. Inspired by how humans discriminate addresses, we propose a multi-task learning approach. The approach jointly recognises the address elements and matches the addresses to incorporate the hierarchical relations between the address elements into the neural network. Simultaneously, we introduce a priori information on the hierarchical relationship of address elements through the conditional random field model. Experimental results on the benchmark datasets Shenzhen Address Database and Jiangsu-Hunan Address Dataset demonstrate the effectiveness of our approach. We achieved state-of-the-art F1 scores (i.e. the harmonic mean of precision and recall) of 99.0 and 94.2 on the two datasets, respectively.
引用
收藏
页码:8919 / 8931
页数:13
相关论文
共 40 条
[1]   Machine learning for cross-gazetteer matching of natural features [J].
Acheson, Elise ;
Volpi, Michele ;
Purves, Ross S. .
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2020, 34 (04) :708-734
[2]  
[Anonymous], 1908, Bull Soc Vaud Sci Nat
[3]  
Bowman SR, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1466
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Enhanced LSTM for Natural Language Inference [J].
Chen, Qian ;
Zhu, Xiaodan ;
Ling, Zhenhua ;
Wei, Si ;
Jiang, Hui ;
Inkpen, Diana .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1657-1668
[6]  
Collobert R., 2008, P 25 INT C MACHINE L, P160
[7]   Machine learning innovations in address matching: A practical comparison of word2vec and CRFs [J].
Comber, Sam ;
Arribas-Bel, Daniel .
TRANSACTIONS IN GIS, 2019, 23 (02) :334-348
[9]   Geocoding Large Population-level Administrative Datasets at Highly Resolved Spatial Scales [J].
Edwards, Sharon E. ;
Strauss, Benjamin ;
Miranda, Marie Lynn .
TRANSACTIONS IN GIS, 2014, 18 (04) :586-603
[10]  
Glorot X., 2011, P 14 INT C ART INT S, P315, DOI DOI 10.1002/ECS2.1832