KMT-PLL: K-Means Cross-Attention Transformer for Partial Label Learning

被引：9

作者：

Fan, Jinfu ^{[1
]}

Huang, Linqing ^{[2
]}

Gong, Chaoyu ^{[2
]}

You, Yang ^{[2
]}

Gan, Min ^{[1
]}

Wang, Zhongjie ^{[3
]}

机构：

[1] Qingdao Univ, Coll Comp Sci & Technol, Qingdao 266071, Peoples R China

[2] Natl Univ Singapore, Dept Comp Sci, Singapore 119077, Singapore

[3] Tongji Univ, Dept Control Sci & Engn, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 02期

关键词：

Cluster centers; K-means cross-attention; partial label learning (PLL); vision transformer (ViT);

D O I：

10.1109/TNNLS.2023.3347792

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Partial label learning (PLL) studies the problem of learning instance classification with a set of candidate labels and only one is correct. While recent works have demonstrated that the Vision Transformer (ViT) has achieved good results when training from clean data, its applications to PLL remain limited and challenging. To address this issue, we rethink the relationship between instances and object queries to propose K-means cross-attention transformer for PLL (KMT-PLL), which can continuously learn cluster centers and be used for downstream disambiguation tasks. More specifically, K-means cross-attention as a clustering process can effectively learn the cluster centers to represent label classes. The purpose of this operation is to make the similarity between instances and labels measurable, which can effectively detect noise labels. Furthermore, we propose a new corrected cross entropy formulation, which can assign weights to candidate labels according to the instance-to-label relevance to guide the training of the instance classifier. As the training goes on, the ground-truth label is progressively identified, and the refined labels and cluster centers in turn help to improve the classifier. Simulation results demonstrate the advantage of the KMT-PLL and its suitability for PLL.

引用

页码：2789 / 2800

页数：12

共 57 条

[1]

Berg T. L., 2005, P ADV NEUR INF PROC, P137

[2]

Briggs F, 2012, P 18 ACM SIGKDD INT, P534, DOI [DOI 10.1145/2339530.2339616, 10.1145/2339530.2339616]

[3] Large Margin Partial Label Machine [J].

Chai, Jing ;

Tsang, Ivor W. ;

Chen, Weijie .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) :2594-2608

[4] A Comprehensive Survey of Scene Graphs: Generation and Application [J].

Chang, Xiaojun ;

Ren, Pengzhen ;

Xu, Pengfei ;

Li, Zhihui ;

Chen, Xiaojiang ;

Hauptmann, Alex .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :1-26

[5] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[6] AutoFormer: Searching Transformers for Visual Recognition [J].

Chen, Minghao ;

Peng, Houwen ;

Fu, Jianlong ;

Ling, Haibin .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12250-12260

[7] Dictionary Learning from Ambiguously Labeled Data [J].

Chen, Yi-Chen ;

Patel, Vishal M. ;

Pillai, Jaishanker K. ;

Chellappa, Rama ;

Phillips, P. Jonathon .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :353-360

[8]

Clanuwat T, 2018, Arxiv, DOI arXiv:1812.01718

[9]

Cour T, 2011, J MACH LEARN RES, V12, P1501

[10]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 6 →