Semantic Feature Graph Consistency with Contrastive Cluster Assignments for Multilingual Document Clustering

被引:1
作者
Sun, Teng [1 ]
Shu, Zhenqiu [1 ]
Huang, Yuxin [1 ]
Wang, Hongbin [1 ]
Yu, Zhengtao [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multilingual document clustering; contrastive learning; semantic feature; similarity; consistent semantic;
D O I
10.1145/3708887
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual document clustering (MDC) aims to partition multilingual documents into distinct clusters based on topic categories in an unsupervised manner. However, existing MDC methods still suffer from several limitations in practice tasks. Firstly, most of them optimize multiple objectives within the same feature space, thereby leading to the conflict between learning consistently shared semantics and reconstructing inconsistent view-specific information. Secondly, several methods directly integrate information from multilingual documents during the fusion stage, thereby overlooking the semantic differences between different language features. To address the aforementioned problems, we propose a novel multi-view learning MDC. Specifically, the proposed SFGC3A method implements consistency objective and reconstruction objective in different feature spaces, thus effectively avoiding conflicts between consistency learning and inconsistency reconstruction. Subsequently, we design the semantic feature graph consistency and semantic label consistency modules to further explore consistent semantic information among multilingual documents, thereby reducing the semantic differences among different language views. Extensive experiments on several multilingual document datasets have shown the effectiveness of the proposed SFGC3A method in MDC tasks. The source codes for this work will be released later.
引用
收藏
页数:22
相关论文
共 53 条
[1]   Multi-view clustering [J].
Bickel, S ;
Scheffer, T .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :19-26
[2]   Diversity-induced Multi-view Subspace Clustering [J].
Cao, Xiaochun ;
Zhang, Changqing ;
Fu, Huazhu ;
Liu, Si ;
Zhang, Hua .
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, :586-594
[3]  
Chao Guoqing, 2021, IEEE Trans Artif Intell, V2, P146, DOI [10.1109/tai.2021.3065894, 10.1109/tai.2021.3065894]
[4]  
Chen Hsin-Hsi, 2000, P 18 INT C COMP LING, V1
[5]   Deep Multiview Clustering by Contrasting Cluster Assignments [J].
Chen, Jie ;
Mao, Hua ;
Woo, Wai Lok ;
Peng, Xi .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :16706-16715
[6]   Representation Learning in Multi-view Clustering: A Literature Review [J].
Chen, Man-Sheng ;
Lin, Jia-Qi ;
Li, Xiang-Long ;
Liu, Bao-Yu ;
Wang, Chang-Dong ;
Huang, Dong ;
Lai, Jian-Huang .
DATA SCIENCE AND ENGINEERING, 2022, 7 (03) :225-241
[7]  
Chen T, 2020, PR MACH LEARN RES, V119
[8]  
Cheng JF, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2973
[9]   Multilingual aspect clustering for sentiment analysis [J].
Costella Pessutto, Lucas Rafael ;
Vargas, Danny Suarez ;
Moreira, Viviane P. .
KNOWLEDGE-BASED SYSTEMS, 2020, 192
[10]   Deep Multiple Auto-Encoder-Based Multi-view Clustering [J].
Du, Guowang ;
Zhou, Lihua ;
Yang, Yudi ;
Lu, Kevin ;
Wang, Lizhen .
DATA SCIENCE AND ENGINEERING, 2021, 6 (03) :323-338