An empirical study of on-line models for relational data streams

被引:0
作者
Ashwin Srinivasan
Michael Bain
机构
[1] BITS Pilani,Department of Computer Science and Information Systems
[2] K.K. Birla Goa Campus,School of Computer Science and Engineering
[3] UNSW,undefined
来源
Machine Learning | 2017年 / 106卷
关键词
Inductive Logic Programming; Data streams; Online learning;
D O I
暂无
中图分类号
学科分类号
摘要
To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.
引用
收藏
页码:243 / 276
页数:33
相关论文
共 38 条
  • [1] Bifet A(2010)MOA: Massive online analysis Journal of Machine Learning Research 11 1601-1604
  • [2] Holmes G(1998)Top-down induction of first order logical decision trees. Artificial Intelligence 101 285-297
  • [3] Kirkby R(1992)Learning boolean functions in an infinite attribute space Machine Learning 9 373-386
  • [4] Pfahringer B(1997)Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain Machine Learning 26 373-386
  • [5] Blockeel H(1997)A decision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences 55 119-139
  • [6] De Raedt L(1965)A correspondence between ALGOL 60 and Church’s lambda notation Communications of the ACM 8 89-101
  • [7] Blum A(2009)Sparse online learning via truncated gradient Journal of Machine Learning Research 10 777-801
  • [8] Blum A(1988)Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm Machine Learning 2 285-318
  • [9] Freund Y(2000)Automating the construction of internet portals with machine learning Information Retrieval 3 127-163
  • [10] Schapire R(2007)Using ILP to construct features for information extraction from semi-structured text ILP 2007 221-224