Efficient and Effective Similarity Search over Probabilistic Data based on Earth Mover's Distance

被引:26
|
作者
Xu, Jia [1 ]
Zhang, Zhenjie [2 ]
Tung, Anthony K. H. [2 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Coll Info Sci & Engn, Shenyang, Liaoning, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1920938
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Probabilistic data is coming as a new deluge along with the technical advances on geographical tracking, multimedia processing, sensor network and RFID. While similarity search is an important functionality supporting the manipulation of probabilistic data, it raises new challenges to traditional relational database. The problem stems from the limited effectiveness of the distance metric supported by the existing database system. On the other hand, some complicated distance operators have proven their values for better distinguishing ability in the probabilistic domain. In this paper, we discuss the similarity search problem with the Earth Mover's Distance, which is the most successful distance metric on probabilistic histograms and an expensive operator with cubic complexity. We present a new database approach to answer range queries and k-nearest neighbor queries on probabilistic data, on the basis of Earth Mover's Distance. Our solution utilizes the primal-dual theory in linear programming and deploys B+ tree index structures for effective candidate pruning. Extensive experiments show that our proposal dramatically improves the scalability of probabilistic databases.
引用
收藏
页码:758 / 769
页数:12
相关论文
共 50 条
  • [1] Efficient and effective similarity search over probabilistic data based on Earth Mover's Distance
    Xu, Jia
    Zhang, Zhenjie
    Tung, Anthony K. H.
    Yu, Ge
    VLDB JOURNAL, 2012, 21 (04): : 535 - 559
  • [2] Efficient and effective similarity search over probabilistic data based on Earth Mover’s Distance
    Jia Xu
    Zhenjie Zhang
    Anthony K. H. Tung
    Ge Yu
    The VLDB Journal, 2012, 21 : 535 - 559
  • [3] EMD-DSJoin: Efficient Similarity Join Over Probabilistic Data Streams Based on Earth Mover's Distance
    Xu, Jia
    Zhang, Jiazhen
    Song, Chao
    Zhang, Qianzhen
    Lv, Pin
    Li, Taoshen
    Chen, Ningjiang
    WEB TECHNOLOGIES AND APPLICATIONS: APWEB 2016 WORKSHOPS, WDMA, GAP, AND SDMA, 2016, 9865 : 42 - 54
  • [4] Earth Mover's Distance based Similarity Search at Scale
    Tang, Yu
    Hou, Leong U.
    Cai, Yilun
    Mamoulis, Nikos
    Cheng, Reynold
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (04): : 313 - 324
  • [5] Distributed Similarity Join Over Data Streams Based on Earth Mover's Distance
    Xu J.
    Song C.
    Lv P.
    Li T.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1779 - 1796
  • [6] Efficient similarity search using the Earth Mover's Distance for large multimedia databases
    Assent, Ira
    Wichterich, Marc
    Meisen, Tobias
    Seidl, Thomas
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 307 - 316
  • [7] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1456 - 1457
  • [8] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2148 - 2162
  • [10] Keyword Search over Web Documents Based on Earth Mover's Distance
    Ma, Jiangang
    Sheng, Quan Z.
    Yao, Lina
    Xu, Yong
    Shemshadi, Ali
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 256 - 265