Similarity Query Processing for High-Dimensional Data

被引:8
作者
Qin, Jianbin [1 ]
Wang, Wei [2 ]
Xiao, Chuan [3 ,4 ]
Zhang, Ying [5 ]
机构
[1] Shenzhen Univ, Shenzhen Inst Comp Sci, Shenzhen, Guangdong, Peoples R China
[2] Univ New South Wales, Sydney, NSW, Australia
[3] Osaka Univ, Suita, Osaka, Japan
[4] Nagoya Univ, Nagoya, Aichi, Japan
[5] Univ Technol Sydney, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 12期
关键词
NEAREST-NEIGHBOR SEARCH; SMALL WORLD; ALGORITHM; SPACE; LSH;
D O I
10.14778/3415478.3415564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.
引用
收藏
页码:3437 / 3440
页数:4
相关论文
共 50 条
  • [1] High-Dimensional Similarity Query Processing for Data Science
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    Wang, Yaoshu
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
  • [2] qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
    Jafari, Omid
    Ossorgin, John
    Nagarkar, Parth
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 329 - 333
  • [3] High-Dimensional Similarity Search for Scalable Data Science
    Echihabi, Karima
    Zoumpatianos, Kostas
    Palpanas, Themis
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372
  • [4] Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    Pajoohan, Mohammad-Reza
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2018, 14 (03) : 38 - 55
  • [5] PHiDJ: Parallel Similarity Self-Join for High-Dimensional Vector Data with MapReduce
    Fries, Sergej
    Boden, Brigitte
    Stepien, Grzegorz
    Seidl, Thomas
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 796 - 807
  • [6] Efficient parallel processing of high-dimensional spatial kNN queries
    Jiang, Tao
    Zhang, Bin
    Lin, Dan
    Gao, Yunjun
    Li, Qing
    SOFT COMPUTING, 2022, 26 (22) : 12291 - 12316
  • [7] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [8] A fast and scalable similarity search in high-dimensional image datasets
    Hanyf, Youssef
    Silkan, Hassan
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (01) : 95 - 104
  • [9] One-dimensional VGGNet for high-dimensional data
    Feng, Sheng
    Zhao, Liping
    Shi, Haiyan
    Wang, Mengfei
    Shen, Shigen
    Wang, Weixing
    APPLIED SOFT COMPUTING, 2023, 135
  • [10] High Dimensional Similarity Search With Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility
    Fu, Cong
    Wang, Changxu
    Cai, Deng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4139 - 4150