ML-KNN: A lazy learning approach to multi-label leaming

被引:2491
作者
Zhang, Min-Ling [1 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Lab Novel Software Technol, Nanjing 210093, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; multi-label learning; lazy learning; K-nearest neighbor; functional genomics; natural scene classification; text categorization;
D O I
10.1016/j.patcog.2006.12.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label learning originated from the investigation of text categorization problem, where each document may belong to several predefined topics simultaneously. In multi-label learning, the training set is composed of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances through analyzing training instances with known label sets. In this paper, a multi-label lazy learning approach named ML-KNN is presented, which is derived from the traditional K-nearest neighbor (KNN) algorithm. In detail, for each unseen instance, its K nearest neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the unseen instance. Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi-label learning algorithms. (c) 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:2038 / 2048
页数:11
相关论文
共 23 条
  • [1] Aha DW, ARTIF INTELL REV, V11
  • [2] [Anonymous], P 26 ANN INT ACM SIG
  • [3] [Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [4] [Anonymous], 2004, P 21 INT C MACH LEAR
  • [5] Learning multi-label scene classification
    Boutell, MR
    Luo, JB
    Shen, XP
    Brown, CM
    [J]. PATTERN RECOGNITION, 2004, 37 (09) : 1757 - 1771
  • [6] Clare R. D., 2001, Lecture Notes in ComputerScience, V2168, P42, DOI [DOI 10.1007/3-540-44794-6_4, 10.1007/3-540-44794-64.11.W., DOI 10.1007/3-540-44794-64.11.W]
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
  • [9] Elisseeff A, 2002, ADV NEUR IN, V14, P681
  • [10] Freund Y, 1999, MACHINE LEARNING, PROCEEDINGS, P124