Streaming random forests

被引:0
作者
Abdulsalam, Hanady [1 ]
Skillicorn, David B. [1 ]
Martin, Patrick [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L 3N6, Canada
来源
IDEAS 2007: 11TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS | 2007年
关键词
data mining; classification; decision trees; data-stream classification; random forests;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many recent applications deal with data streams, conceptually endless sequences of data records, often arriving at high flow rates. Standard data-mining techniques typically assume that records can be accessed multiple times and so do not naturally extend to streaming data. Algorithms for mining streams must be able to extract all necessary information from records with only one, or perhaps a few, passes over the data. We present the Streaming Random Forests algorithm, an online and incremental stream classification algorithm that extends Breiman's Random Forests algorithm. The Streaming Random Forests algorithm grows multiple decision trees, and classifies unlabelled records based on the plurality of tree votes. We evaluate the classification accuracy of the Streaming Random Forests algorithm on several datasets, and show that its accuracy is comparable to the standard Random Forest algorithm.
引用
收藏
页码:225 / 232
页数:8
相关论文
共 18 条
  • [1] Aggarwal C.C., 2003, P 29 INT C VER LARG, P81, DOI DOI 10.1016/B978-012722442-8/50016-1
  • [2] [Anonymous], P 19 S APPL COMP
  • [3] [Anonymous], 2003, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min, DOI 10.1145/ 956750.956778
  • [4] [Anonymous], CLASIFICATION REGRES
  • [5] BLACKARD JA, 1998, THESIS COLORADO STAT
  • [6] BREIMAN L, 1999, RANDOM FORESTS TECHN
  • [7] Bulut A, 2005, PROC INT CONF DATA, P44
  • [8] An adaptive learning approach for noisy data streams
    Chu, F
    Wang, YZ
    Zaniolo, C
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 351 - 354
  • [9] Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
  • [10] Fan W., 2004, P 10 ACM SIGKDD INT, P128