An efficient secure k nearest neighbor classification protocol with high-dimensional features

被引:10
|
作者
Sun, Maohua [1 ]
Yang, Ruidi [1 ]
机构
[1] Capital Univ Econ & Business, Sch Management & Engn, Beijing 100070, Peoples R China
关键词
fully homomorphic scheme; honest-but-curious model; machine learning; oblivious transfer; privacy-preserving Euclidean distance; secure kNN; secure multi-party computation; MACHINE;
D O I
10.1002/int.22272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k Nearest neighbor (kNN) classification algorithm is a prediction model which is widely used for real-life applications, such as healthcare, finance, computer vision, personalization recommendation and precision marketing. The arrival of data explosion era results in the significant increase of feature dimension, which also makes for the increase of privacy concern over the available samples and unlabeled data in the applications of machine learning. In this paper, we present a secure low communication overhead kNN classification protocol that is able to deal with high-dimensional features given in real numbers. First, to deal with feature values given in real numbers, we develop a specific data conversion algorithm, which is used in the chosen fully homomorphic scheme. This conversion algorithm is generic and applicable to other algorithms that need to handle real numbers using the fully homomorphic scheme. Second, we present a privacy-preserving euclidean distance protocol (PPEDP), which works with the Euclidean distance computation between two points given in real numbers in a high-dimensional space. Then, based on the novelty PPEDP and oblivious transfer, we propose a new classification approach, efficient secure kNN classification protocol, (ESkNN) with low communication overhead, which is appropriate for a sample set with high-dimensional features and real number feature values. Moreover, we implement ESkNN in C++. Experimental results show that ESkNN is several orders of magnitude faster in performance than existing works, and scales up to 18 000 feature dimension in a memory limited environment.
引用
收藏
页码:1791 / 1813
页数:23
相关论文
共 50 条
  • [1] Redefining nearest neighbor classification in high-dimensional settings
    Lopez, Julio
    Maldonado, Sebastian
    PATTERN RECOGNITION LETTERS, 2018, 110 : 36 - 43
  • [2] Hubness-Aware Shared Neighbor Distances for High-Dimensional k-Nearest Neighbor Classification
    Tomasev, Nenad
    Mladenic, Dunja
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 116 - 127
  • [3] An efficient nearest neighbor search in high-dimensional data spaces
    Lee, DH
    Kim, HJ
    INFORMATION PROCESSING LETTERS, 2002, 81 (05) : 239 - 246
  • [4] Towards Secure Approximate k-Nearest Neighbor Query Over Encrypted High-Dimensional Data
    Peng, Yanguo
    Li, Hui
    Cui, Jiangtao
    Ma, Jianfeng
    Liu, Yingfan
    IEEE ACCESS, 2018, 6 : 23137 - 23151
  • [5] Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
    Nenad Tomašev
    Miloš Radovanović
    Dunja Mladenić
    Mirjana Ivanović
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 445 - 458
  • [6] Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) : 445 - 458
  • [7] Hubness-aware shared neighbor distances for high-dimensional -nearest neighbor classification
    Tomasev, Nenad
    Mladenic, Dunja
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 39 (01) : 89 - 122
  • [8] HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search
    Wan, Ji
    Tang, Sheng
    Zhang, Yongdong
    Li, Jintao
    Wu, Pengcheng
    Hoi, Steven C. H.
    NEUROCOMPUTING, 2017, 237 : 401 - 404
  • [9] Nearest Neighbor Search in High-Dimensional Spaces
    Andoni, Alexandr
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2011, 2011, 6907 : 1 - 1
  • [10] Accelerating high-dimensional nearest neighbor queries
    Lang, CA
    Singh, AK
    14TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2002, : 109 - 118