DIFF: a relational interface for large-scale data explanation

被引:0
作者
Firas Abuzaid
Peter Kraft
Sahaana Suri
Edward Gan
Eric Xu
Atul Shenoy
Asvin Ananthanarayan
John Sheu
Erik Meijer
Xi Wu
Jeff Naughton
Peter Bailis
Matei Zaharia
机构
[1] Stanford University,Stanford DAWN Project
[2] Microsoft Inc,undefined
[3] Facebook Inc,undefined
[4] Google Inc,undefined
来源
The VLDB Journal | 2021年 / 30卷
关键词
Data exploration; Explanations; Big data; Data analytics; Databases; Feature selection; Query optimization;
D O I
暂无
中图分类号
学科分类号
摘要
A range of explanation engines assist data analysts by performing feature selection over increasingly high-volume and high-dimensional data, grouping and highlighting commonalities among data points. While useful in diverse tasks such as user behavior analytics, operational event processing, and root-cause analysis, today’s explanation engines are designed as stand-alone data processing tools that do not interoperate with traditional, SQL-based analytics workflows; this limits the applicability and extensibility of these engines. In response, we propose the DIFF operator, a relational aggregation operator that unifies the core functionality of these engines with declarative relational query processing. We implement both single-node and distributed versions of the DIFF operator in MB SQL, an extension of MacroBase, and demonstrate how DIFF can provide the same semantics as existing explanation engines while capturing a broad set of production use cases in industry, including at Microsoft and Facebook. Additionally, we illustrate how this declarative approach to data explanation enables new logical and physical query optimizations. We evaluate these optimizations on several real-world production applications and find that DIFF in MB SQL can outperform state-of-the-art engines by up to an order of magnitude.
引用
收藏
页码:45 / 70
页数:25
相关论文
共 50 条
[41]   Trust in Centralized Large-Scale Data Repository: A Qualitative Analysis [J].
Broekstra, Reinder ;
Aris-Meijer, Judith ;
Maeckelberghe, Els ;
Stolk, Ronald ;
Otten, Sabine .
JOURNAL OF EMPIRICAL RESEARCH ON HUMAN RESEARCH ETHICS, 2020, 15 (04) :365-378
[42]   Filter Large-scale Engine Data using Apache Spark [J].
Pirozzi, Donato ;
Scarano, Vittorio ;
Begg, Steven ;
De Sercey, Guillaume ;
Fish, Andrew ;
Harvey, Andrew .
2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2016, :1300-1305
[43]   A Tutorial on Secure Outsourcing of Large-scale Computations for Big Data [J].
Salinas, Sergio ;
Chen, Xuhui ;
Ji, Jinlong ;
Li, Pan .
IEEE ACCESS, 2016, 4 :1406-1416
[44]   Evaluating Deep Packet Inspection in Large-scale Data Processing [J].
Angiulli, Fabrizio ;
Furfaro, Angelo ;
Sacca, Domenico ;
Sacco, Ludovica .
2022 9TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD, FICLOUD, 2022, :16-23
[45]   Large-scale analysis of frequency modulation in birdsong data bases [J].
Stowell, Dan ;
Plumbley, Mark D. .
METHODS IN ECOLOGY AND EVOLUTION, 2014, 5 (09) :901-912
[46]   Big Data Collection in Large-Scale Wireless Sensor Networks [J].
Djedouboum, Asside Christian ;
Ari, Ado Adamou Abba ;
Gueroui, Abdelhak Mourad ;
Mohamadou, Alidou ;
Aliouat, Zibouda .
SENSORS, 2018, 18 (12)
[47]   The Analysis of Large-Scale Climate Data: Jordan Case Study [J].
Jararweh, Yaser ;
Alsmadi, Izzat ;
Al-Ayyoub, Mahmoud ;
Jenerette, Darrel .
2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, :288-294
[48]   Large-scale Data Exploration Using Explanatory Regression Functions [J].
Savva, Fotis ;
Anagnostopoulos, Christos ;
Triantafillou, Peter ;
Kolomvatsos, Kostas .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2020, 14 (06)
[49]   Multiclass Classification Problem of Large-Scale Biomedical Meta Data [J].
Student, Sebastian ;
Pieter, Justyna ;
Fujarewicz, Krzysztof .
9TH INTERNATIONAL CONFERENCE INTERDISCIPLINARITY IN ENGINEERING, INTER-ENG 2015, 2016, 22 :938-945
[50]   Snowflake Data Warehouse for Large-Scale and Diverse Biological Data Management and Analysis [J].
Koreeda, Tatsuya ;
Honda, Hiroshi ;
Onami, Jun-ichi .
GENES, 2025, 16 (01)