Supporting Uncertain Predicates in DBMS Using Approximate String Matching and Probabilistic Databases

被引:1
作者
Jumde, Amol S. [1 ]
Keskar, Ravindra B. [1 ]
机构
[1] Visvesvaraya Natl Inst Technol, Dept Comp Sci & Engn, Nagpur 440010, Maharashtra, India
关键词
Approximate string matching; probabilistic databases; uncertain predicate; INCOMPLETE INFORMATION; QUERY EVALUATION; MAYBMS; JOIN;
D O I
10.1109/ACCESS.2020.3021945
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current relational database systems are deterministic in nature and lack the support for approximate matching. The result of approximate matching would be the tuples annotated with the percentage of similarity but the existing relational database system can not process these similarity scores further. In this paper, we propose a system to support approximate matching in the DBMS field. We introduce a 'approximate to' (uncertain predicate operator) for approximate matching and devise a novel formula to calculate the similarity scores. Instead of returning an empty answer set in case of no match, our system gives ranked results thereby providing a glance at existing tuples closely matching with the queried literals. Two variants of the 'approximate to' operator are also introduced for numeric data: 'approximate to+' for higher-the-better and 'approximate to-' for lower-the-better cases. Efficient approximate string matching methods are proposed for matching string-type data whereas numeric closeness is used for other types of data (date, time, and number). We also provide results of our system taken over several sample queries that illustrate the significance of our system. All experiments are performed using the MySQL database, whereas the IMDb movie database and European Football database are used as sample datasets.
引用
收藏
页码:169070 / 169081
页数:12
相关论文
共 50 条
[1]  
Agrawal P., 2006, VLDB, P1151
[2]   Probabilistic nearest neighbor query processing on distributed uncertain data [J].
Amagata, Daichi ;
Sasaki, Yuya ;
Hara, Takahiro ;
Nishio, Shojiro .
DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (02) :259-287
[3]  
[Anonymous], 2016, EUROPEAN SOCCER DATA
[4]  
[Anonymous], 2017, MODIFIED DATASETS
[5]  
[Anonymous], 2007, P 33 INT C VER LARG, DOI 10.5555/1325851.1325858
[6]  
Antova L, 2007, PROC INT CONF DATA, P1454
[7]   Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter [J].
Chatzakou, Despoina ;
Kourtellis, Nicolas ;
Blackburn, Jeremy ;
De Cristofaro, Emiliano ;
Stringhini, Gianluca ;
Vakali, Athena .
PROCEEDINGS OF THE 28TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT'17), 2017, :65-74
[8]   Mean Birds: Detecting Aggression and Bullying on Twitter [J].
Chatzakou, Despoina ;
Kourtellis, Nicolas ;
Blackburn, Jeremy ;
De Cristofaro, Emiliano ;
Stringhini, Gianluca ;
Vakali, Athena .
PROCEEDINGS OF THE 2017 ACM WEB SCIENCE CONFERENCE (WEBSCI '17), 2017, :13-22
[9]   Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying [J].
Chatzakou, Despoina ;
Kourtellis, Nicolas ;
Blackburn, Jeremy ;
De Cristofaro, Emiliano ;
Stringhini, Gianluca ;
Vakali, Athena .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :1285-1290
[10]  
Codd E. F., 1979, ACM Transactions on Database Systems, V4, P397, DOI 10.1145/320107.320109