Learning Geolocation by Accurately Matching Customer Addresses via Graph based Active Learning

被引:1
作者
Maheshwary, Saket [1 ]
Sohoney, Saurabh [1 ]
机构
[1] Amazon, Last Mile, Bellevue, WA 98109 USA
来源
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年
关键词
Active Learning; Entity Matching; Graph Theory; Geocoding;
D O I
10.1145/3543873.3584647
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel adaptation of graph-based active learning for customer address resolution or de-duplication, with the aim to determine if two addresses represent the same physical building or not. For delivery systems, improving address resolution positively impacts multiple downstream systems such as geocoding, route planning and delivery time estimations, leading to an efcient and reliable delivery experience, both for customers as well as delivery agents. Our proposed approach jointly leverages address text, past delivery information and concepts from graph theory to retrieve informative and diverse record pairs to label. We empirically show the efectiveness of our approach on manually curated dataset across addresses from India (IN) and United Arab Emirates (UAE). We achieved 9.3% absolute improvement in recall on average across IN and UAE while preserving 95% precision over the existing production system. We also introduce delivery point (DP) geocode learning for cold-start addresses as a downstream application of address resolution. In addition to ofine evaluation, we also performed online A/B experiments which show that when the production model is augmented with active learnt record pairs, the delivery precision improved by 7.84% and delivery defects reduced by 12.32% on an average across shipments from IN and UAE.
引用
收藏
页码:457 / 463
页数:7
相关论文
共 45 条
  • [1] Biemann Chris, 2007, P 2 WORKSH TEXTGRAPH
  • [2] Bilenko M., 2003, 9 ACM SIGKDD INTCONF, P39, DOI DOI 10.1145/956750.956759
  • [3] Bilgic M., 2010, ICML
  • [4] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [5] Bodo Z., 2011, JMLR, V16, P127
  • [6] Bojanowski P., 2017, Trans. ACL, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
  • [7] Buluç A, 2016, LECT NOTES COMPUT SC, V9220, P117, DOI 10.1007/978-3-319-49487-6_4
  • [8] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [9] A Self-Balanced Min-Cut Algorithm for Image Clustering
    Chen, Xiaojun
    Haung, Joshua Zhexue
    Nie, Feiping
    Chen, Renjie
    Wu, Qingyao
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2080 - 2088
  • [10] Chopde NR., 2013, International J Innov Res Comput Commun Eng, V1, P298, DOI DOI 10.15680/IJIRCCE