Probabilistic Label Tree for Streaming Multi-Label Learning

被引:7
作者
Wei, Tong [1 ]
Shi, Jiang-Xin [1 ]
Li, Yu-Feng [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
来源
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年
基金
国家重点研发计划;
关键词
multi-label learning; streaming label learning;
D O I
10.1145/3447548.3467226
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label learning aims to predict a subset of relevant labels for each instance, which has many real-world applications. Most extant multi-label learning studies focus on a fixed size of label space. However, in many cases, the environment is open and changes gradually and new labels emerge, which is coined as streaming multi-label learning (SMLL). SMLL poses great challenges in twofolds: (1) the target output space expands dynamically; (2) new labels emerge frequently and can reach a significantly large number. Previous attempts on SMLL leverage label correlations between past and emerging labels to improve the performance, while they are inefficient when deal with large-scale problems. To cope with this challenge, in this paper, we present a new learning framework, i.e., the probabilistic streaming label tree (Pslt). In particular, each non-leaf node of the tree corresponding to a subset of labels, and a binary classifier is learned at each leaf node. Initially, Pslt is learned on partially observed labels, both tree structure and node classifiers are updated while new labels emerge. Using carefully designed updating mechanism, Pslt can seamlessly incorporate new labels by first passing them down from the root to leaf nodes and then update node classifiers accordingly. We provide theoretical bounds for the iteration complexity of tree update procedure and the estimation error on newly arrived labels. Experiments show that the proposed approach improves the performance in comparison with eleven baselines in terms of multiple evaluation metrics. The source code is available at https://gitee.com/pslt-kdd2021/pslt.
引用
收藏
页码:1801 / 1811
页数:11
相关论文
共 38 条
[1]  
[Anonymous], 2016, ARXIV160604988
[2]  
Bengio S., 2010, Advances in Neural Information Processing Systems, P163
[3]  
Bhatia K, 2015, 29 ANN C NEURAL INFO, V28
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[6]  
Fang Huang, 2019, P 2019 SIAM INT C DA, P280
[7]  
Hsu D., 2009, NIPS, V22, P772
[8]  
Jasinska-Kobus Kalina, 2020, ARXIV200911218
[9]  
Jasinska-Kobus Kalina, 2020, ARXIV200704451
[10]  
Khandagale S., 2019, ARXIV190408249