Multistream Regression with Asynchronous Concept Drift Detection

被引:0
作者
Dong, Bo [1 ]
Li, Yifan [1 ]
Gao, Yang [1 ]
Haque, Ahsanul [1 ]
Khan, Latifur [1 ]
Masud, Mohammad M. [2 ]
机构
[1] Univ Texas Dallas, Richardson, TX 75083 USA
[2] United Arab Emirates Univ, Al Ain, U Arab Emirates
来源
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年
关键词
multistream; regression; covariate shift; concept drift;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A recently introduced problem setting, referred as multistream, involves two independent non-stationary data generating processes. One of them is called source stream, which generates continuous data instances with true output. And the other one called target stream, which generates data instances lacking of true output. Due to the nature of data streams, scholars have addressed prediction problems under scenarios such as covariate shift or concept drift in past studies by discussing one assumption while keeping others consistent. For example, it is assumed that the data distributions of training and testing data are similar, and true output values of the stream instances would be available soon. However, in practice these assumptions are not always valid. The multistream regression problem is to predict the output of target stream, using data instances and their true output from source stream. In this paper, we propose an approach of multistream regression by incorporating concept drift detection into covariate shift adaptation. Meanwhile, empirical evaluation on synthetic and real world datasets demonstrates the effectiveness of the proposed technique by competing with the state-of-the-art approaches. Experiment results indicate that our method significantly improved prediction performance compared to existing benchmark.
引用
收藏
页码:596 / 605
页数:10
相关论文
共 27 条
  • [1] A theory of learning from different domains
    Ben-David, Shai
    Blitzer, John
    Crammer, Koby
    Kulesza, Alex
    Pereira, Fernando
    Vaughan, Jennifer Wortman
    [J]. MACHINE LEARNING, 2010, 79 (1-2) : 151 - 175
  • [2] REASONING ABOUT NAMING SYSTEMS
    BOWMAN, M
    DEBRAY, SK
    PETERSON, LL
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1993, 15 (05): : 795 - 825
  • [3] Braams J., 1991, TUGBOAT, V12, P291
  • [4] Chandra S., 2016, P 25 ACM INT C INF K
  • [5] Chen XL, 2016, JMLR WORKSH CONF PRO, V51, P1270
  • [6] Time-Varying Transition Probability Matrix Estimation and Its Application to Brand Share Analysis
    Chiba, Tomoaki
    Hino, Hideitsu
    Akaho, Shotaro
    Murata, Noboru
    [J]. PLOS ONE, 2017, 12 (01):
  • [7] Clark Malcolm, 1991, TEX90 C P MARCH, P84
  • [8] Dundar M., 2012, CORR
  • [9] Monitoring forest cover loss using multiple data streams, a case study of a tropical dry forest in Bolivia
    Dutrieux, Loic Paul
    Verbesselt, Jan
    Kooistra, Lammert
    Herold, Martin
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 107 : 112 - 125
  • [10] Event labeling combining ensemble detectors and background knowledge
    Fanaee-T H.
    Gama J.
    [J]. Progress in Artificial Intelligence, 2014, 2 (2-3) : 113 - 127