In-Database Learning with Sparse Tensors

被引:36
作者
Khamis, Mahmoud Abo [1 ]
Ngo, Hung Q. [1 ]
Nguyen, XuanLong [2 ]
Olteanu, Dan [3 ]
Schleich, Maximilian [3 ]
机构
[1] RelationalAI Inc, Berkeley, CA 94704 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
[3] Univ Oxford, Oxford, England
来源
PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS | 2018年
关键词
In-database analytics; Functional aggregate queries; Functional dependencies; Model reparameterization; Tensors; LIBRARY;
D O I
10.1145/3196959.3196960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In-database analytics is of great practical importance as it avoids the costly repeated loop data scientists have to deal with on a daily basis: select features, export the data, convert data format, train models using an external tool, reimport the parameters. It is also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This paper introduces a unified framework for training and evaluating a class of statistical learning models inside a relational database. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from relational database theory such as schema information, query structure, recent advances in query evaluation algorithms, and from linear algebra such as various tensor and matrix operations, one can formulate in-database learning problems and design efficient algorithms to solve them. The algorithms and models proposed in the paper have already been implemented and deployed in retail-planning and forecasting applications, with significant performance benefits over out-of-database solutions that require the costly data-export loop.
引用
收藏
页码:325 / 340
页数:16
相关论文
共 50 条
[1]  
Abadi Martin, 2016, arXiv
[2]  
Abiteboul Serge, 1995, FDN DATABASES, DOI DOI 10.5555/551350
[3]  
Abiteboul Serge, 2017, ABS170109007 CORR
[4]  
Adler Isolde, 2006, THESIS A LUDWIGS U
[5]  
Agrawal R., 1996, ADV KNOWLEDGE DISCOV, V12, P307, DOI DOI 10.1007/978-3-319-31750-2.
[6]  
[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
[7]   Design and Implementation of the LogicBlox System [J].
Aref, Molham ;
ten Cate, Balder ;
Green, Todd J. ;
Kimelfeld, Benny ;
Olteanu, Dan ;
Pasalic, Emir ;
Veldhuizen, Todd L. ;
Washburn, Geoffrey .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1371-1382
[8]   Size Bounds and Query Plans for Relational Joins [J].
Atserias, Albert ;
Grohe, Martin ;
Marx, Daniel .
PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, :739-+
[9]   Aggregation and Ordering in Factorised Databases [J].
Bakibayev, Nurzhan ;
Kocisky, Tomas ;
Olteanu, Dan ;
Zavodny, Jakub .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14) :1990-2001
[10]   2-POINT STEP SIZE GRADIENT METHODS [J].
BARZILAI, J ;
BORWEIN, JM .
IMA JOURNAL OF NUMERICAL ANALYSIS, 1988, 8 (01) :141-148