Distribution-based aggregation for relational learning with identifier attributes

被引:61
作者
Perlich, C [1 ]
Provost, F
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] NYU, New York, NY USA
关键词
identifiers; relational learning; aggregation; networks;
D O I
10.1007/s10994-006-6064-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identifier attributes-very high-dimensional categorical attributes such as particular product ids or people's names-rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating-for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures.
引用
收藏
页码:65 / 105
页数:41
相关论文
共 59 条
  • [1] [Anonymous], P EUR C MACH LEARN E
  • [2] [Anonymous], 2004, P 10 ACM SIGKDD INT
  • [3] [Anonymous], 2002, P 19 INT C MACH LEAR
  • [4] [Anonymous], 1998, P 1998 ACM SIGMOD IN
  • [5] [Anonymous], P IJCAI WORKSH LEARN
  • [6] Bernstein A., 2002, P KDD 2002 WORKSH MU, P7
  • [7] Top-down induction of first-order logical decision trees
    Blockeel, H
    De Raedt, L
    [J]. ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) : 285 - 297
  • [8] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [9] Cortes C., 2002, Intelligent Data Analysis, V6, P211
  • [10] Relational learning with statistical predicate invention: Better models for hypertext
    Craven, M
    Slattery, S
    [J]. MACHINE LEARNING, 2001, 43 (1-2) : 97 - 119